2 years ago
#77500
Michael
Determining CUDA compute capability as constexpr for __launch_bounds__
In order to launch a CUDA kernel efficiently I'd like to use __launch_bounds__
with arguments that depend on the maximal threads per SM allowed in the current GPU, which in turn depends on that GPU's compute capability.
One way to do that is via cudaDeviceProp
structure returned from cudaGetDeviceProperties
. Unfortunately, that wouldn't do: __launch_bounds__
requires its arguments to be constexpr
. Therefore I cannot call cudaGetDeviceProperties
for the purpose of specifying __launch_bounds__
.
Thus the question: how do I determine either the maximal threads per SM (preferred) or the CUDA capabilities number at compile time so that I could pass that into __launch_bounds__
?
c++
cuda
0 Answers
Your Answer