2 years ago

#77500

test-img

Michael

Determining CUDA compute capability as constexpr for __launch_bounds__

In order to launch a CUDA kernel efficiently I'd like to use __launch_bounds__ with arguments that depend on the maximal threads per SM allowed in the current GPU, which in turn depends on that GPU's compute capability.

One way to do that is via cudaDeviceProp structure returned from cudaGetDeviceProperties. Unfortunately, that wouldn't do: __launch_bounds__ requires its arguments to be constexpr. Therefore I cannot call cudaGetDeviceProperties for the purpose of specifying __launch_bounds__.

Thus the question: how do I determine either the maximal threads per SM (preferred) or the CUDA capabilities number at compile time so that I could pass that into __launch_bounds__?

c++

cuda

0 Answers

Your Answer

Accepted video resources