Abstract: As more emerging applications are moving to GPUs, fine-grained synchronization has become imperative. However, their performance can be severely impaired in case of frequent synchronization failures caused by high data contention. Differently from CPUs, GPUs own thousands of hardware threads and adopt single instruction multiple threads paradigm, making it impractical to deploy the CPU contention management mechanisms directly on GPUs. In this article, we design a Software Warp Controlling Framework (SWCF), which employs producer-consumer execu...
(read more)
Topics: 
Parallel computing
Distributed computing