Is It Safe To Implement Cuda Gridsync() In Numba Like This
Solution 1:
I think that Robert Crovella's comment points to the correct answer to why this method will fail.
I was incorrectly assuming the scheduler did pre-emptive multi-tasking so that all blocks would get a time slice to run in.
Currently Nvidia GPU's do not have pre-emptive multi-taking schedulers. Jobs run to completion.
Thus it is possible that once enough blocks enter the while loop to wait, that remaining blocks will not be launched by the scheduler. Thus the wait loop will wait forever.
I see there are research papers suggesting how to Nvidia could make it's scheduler pre-emptive. https://www.computer.org/csdl/proceedings/snpd/2012/2120/00/06299288.pdf But evidently that's not the case right now.
I am left wondering how the cuda-C managed to pull off the gridSync() command. If it can be done in C, there must be some generic way to work around these limitations. This is a mystery I hope someone comments on below
It's really a shame to leave a 1000x speedup on the table.
Post a Comment for "Is It Safe To Implement Cuda Gridsync() In Numba Like This"