General-purpose programming on GPUs (GPGPU) is becoming increasingly in vogue as applications such as machine learning and scientific computing demand high throughput in vector-parallel applications. NVIDIA’s CUDA toolkit seeks to make GPGPU programming accessible by allowing programmers to write GPU functions, called kernels, in a small extension of C/C++. However, due to CUDA’s complex execution model, the performance characteristics of CUDA kernels are difficult to predict, especially for novice programmers.
This paper introduces a novel quantitative program logic for CUDA kernels, which allows programmers to reason about both functional correctness and resource usage of CUDA kernels, paying particular attention to a set of common but CUDA-specific performance bottlenecks. The logic is proved sound with respect to a novel operational cost semantics for CUDA kernels. The semantics, logic and soundness proofs are formalized in Coq. An inference algorithm based on LP solving automatically synthesizes symbolic resource bounds by generating derivations in the logic. This algorithm is the basis of an end-to-end resource-analysis tool for kernels, which has been implemented using an existing resource-analysis tool for imperative programs. An experimental evaluation on a suite of CUDA benchmarks shows that the analysis is effective in aiding the detection of performance bugs in CUDA kernels.
Thu 21 JanDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 17:00 | |||
16:00 10mTalk | Verifying Observational Robustness Against a C11-style Memory Model POPL Link to publication DOI | ||
16:10 10mTalk | Provably Space Efficient Parallel Functional ProgrammingDistinguished Paper POPL Link to publication DOI | ||
16:20 10mTalk | Modeling and Analyzing Evaluation Cost of CUDA Kernels POPL Link to publication DOI | ||
16:30 10mTalk | Optimal Prediction of Synchronization-Preserving Races POPL Umang Mathur University of Illinois at Urbana-Champaign, Andreas Pavlogiannis Aarhus University, Mahesh Viswanathan University of Illinois at Urbana-Champaign Link to publication DOI Pre-print | ||
16:40 10mTalk | Taming x86-TSO Persistency POPL Link to publication DOI Pre-print | ||
16:50 10mBreak | Break POPL |