Very interesting and cool insights you got there, as always! I'm curious about the freebasic or inline assembly way to make it faster as I would presume this is how FastExt does this effect.
There is also this one idea that I am very interested with the outcome from Fredborg which RemiD described before that you might look into below. I guess you might use some form of light trails effect for the rays and perhaps you can have a go at it! ๐
"the idea was to have a subdivided quad parented to the camera, have its vertices colored with the sun color, and use linepicks from the sun to each vertex, and set the vertices alphas accordingly (if a light ray can reach a vertex, alpha 0.5, if a light ray can't reach a vertex, alpha 0)
with blendmode add or multiply2..."