Is It Safe To Implement Cuda Gridsync() In Numba Like This

August 07, 2024 Post a Comment

Numba lacks the cuda-C command gridsync() so there is not a canned method for syncing across an entire grid. Only block level syncs is available. If cudaKernal1 is a very fast exec

Solution 1:

I think that Robert Crovella's comment points to the correct answer to why this method will fail.

I was incorrectly assuming the scheduler did pre-emptive multi-tasking so that all blocks would get a time slice to run in.

Currently Nvidia GPU's do not have pre-emptive multi-taking schedulers. Jobs run to completion.

Thus it is possible that once enough blocks enter the while loop to wait, that remaining blocks will not be launched by the scheduler. Thus the wait loop will wait forever.

I see there are research papers suggesting how to Nvidia could make it's scheduler pre-emptive. https://www.computer.org/csdl/proceedings/snpd/2012/2120/00/06299288.pdf But evidently that's not the case right now.

I am left wondering how the cuda-C managed to pull off the gridSync() command. If it can be done in C, there must be some generic way to work around these limitations. This is a mystery I hope someone comments on below

It's really a shame to leave a 1000x speedup on the table.

lacucinadiadine

Is It Safe To Implement Cuda Gridsync() In Numba Like This

Solution 1:

Post a Comment for "Is It Safe To Implement Cuda Gridsync() In Numba Like This"

Widget HTML #3