The large-scale parallelism on modern GPUs have the potential of introducing concurrency errors. Prior work has looked into the problem of detecting data races in a GPU kernel with software-only or hardware-only support. Existing work has mostly ignored the challenges involved with detecting races on unified shared memory (USM). Traditional techniques that track the ‘happens-before’ relationship may not scale well for detecting data races across CPU and GPU threads.
In this talk, we discuss our proposed approach using collision analysis to detect USM data races on Intel GPUs. Our proposed approach is synchronization-oblivious and is easily extensible to evaluate other design choices like sampling and capping the overhead introduced. Our proposed analysis should help in enhancing the debugging toolchain for Data Parallel C++ (DPC++) programs with oneAPI.
Learning objectives:We will describe the challenges in efficient data race detection on Unified Shared Memory systems. We will describe a proposed solution that aims to implement an efficient data race detection technique for the USM systems on Intel architectures. Developer will walk away with: challenges and ideas in developing efficient concurrency analysis, and a prototype for debugging heterogenous CPU/GPU code.
Configuration used – CPU: Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz, GPU: Intel(R) UHD Graphics P630, OS: Ubuntu 20.04