Yeah, I tried that and get a CUDA out of memory error. Is there a work around? I have more than enough shared memory.