Post by jfkEO1010etc in God-Rays for Blitz3D comments

Viewing post in God-Rays for Blitz3D comments

Oh and BTW (also sorry for the text avalanche), as I'm thinking about it, I may as well try an entirely different approach that uses only 1 render: do a point sample mini copy of the main render from the backbuffer, eg. 256x256 pixels, then scale them down to 128x128 while smoothing the point samples (blur-shrink), probably using inline assembler (like you can in gfa basic, and probably freebasic that can make it a dll that then can be used in Blitz3D via userlib decls). And when reading the point samples from the render, ignore anything that isn't bright enough (or lower it to rgb 0).

jfkEO1010etc4 years ago(+1)

Ok, in case anyone is interested: I tried the above idea (not the ASM part) and two things became clear: first of all, doing only one render inevitably causes a recursive feedback, because the points sampled are brightened and sampled again repeatedly, forcing me to use a very low alpha, but even then it stabilizes only due to rounding errors, causing it to flicker wildly. So I concluded there is no way around a 2nd render.

However, I found a much faster way: Render the scene without the rays mesh, full display size. Then do the point sample from the backbuffer and move it to the ray mesh texture. Then move the camera 10000 units away, where the entire scene is out of rendering range (the ray mesh is parented to the camera), set the cameraClsMode to maintain the backbuffer and now render the ray mesh alone ontop of it. Then set CameraClsMode to 1,1 again and move camera back to the scene. I was able to lower the rendering time of the effect from 23 to 16 ms - still very slow.

That's when I figured out the second thing: from the 16 ms about 13 ms were used only by the commands lockbuffer backbuffer() and unlockbuffer backbuffer() ! I tried it with no fastpixelreading, it took 13ms, then also without lockbuffer and it went down to like 1ms.

So the main bottleneck seems to be lockbuffer. It seems to wait for some green light from directX, which is in sync with the system framerate. I tried VWait right before lockbuffer and was able to lower it from 16 to 5 ms. But VWait should always be followed by flip 0, if used at all. Maybe I'll upload the source.

blitzgames4 years ago (3 edits)

Very interesting and cool insights you got there, as always! I'm curious about the freebasic or inline assembly way to make it faster as I would presume this is how FastExt does this effect.

There is also this one idea that I am very interested with the outcome from Fredborg which RemiD described before that you might look into below. I guess you might use some form of light trails effect for the rays and perhaps you can have a go at it! 😊

"the idea was to have a subdivided quad parented to the camera, have its vertices colored with the sun color, and use linepicks from the sun to each vertex, and set the vertices alphas accordingly (if a light ray can reach a vertex, alpha 0.5, if a light ray can't reach a vertex, alpha 0)
with blendmode add or multiply2..."

jfkEO1010etc4 years ago

Interesting, but a ray resolution of 256x256 would be 64k Linepicks which might be slow too. Below 128x128 it becomes really blurry.

blitzgames4 years ago

I was thinking of another simpler way of rendering the rays and then doing a 2D image mask on the objects in front so the scene will be like a 1 or 2 pass render to texture effect.

jfkEO1010etc4 years ago(+1)

Either way, while the FestExt library by mikail seems distincted online, luckily I found my purchased version on the harddrive, containing RenderToTexture. If time permits...

blitzgames4 years ago

There's a copy on the blitzcoder forum as well for posterity. btw, would a normal render to texture work? not specifically using fastext..?

jfkEO1010etc4 years ago (1 edit) (+1)

Maybe I'm wrong, but as far as I remember Blitz3D doesn't support Render to Texture. The addition of Flag 256 allowed for faster access ("Store texture in vram"), but no direct render to texture. Copyrect from backbuffer to vram-texture is rather fast tho. Lockbuffer is slow and the actual problem. A way around would be:

render actual scene, without rays.
render mini version 256x256 and copyrect to texturebuffer (optimized with details like grass hidden)
render rays mesh alone ontop

This 3 render method does not allow for pixel access, so I'd use EntityColor 0,0,0 for shadow casting things, like in the source. Most important: no lockbuffer required. That should lower the 16 ms to 2 ms. The beauty of extracting the texture from the first render using readpixel etc. was, that its speed is independent from scene complexity, but well, it required lockbuffer, so..

edit: I just tried that, not significantly faster, unless I did something wrong. Then there was also a hack, to directly peek/poke VRam and Buffers, I guess using the memory.lib. Most likely not very stabile.

blitzgames4 years ago (4 edits) (+1)

Yes, I meant the copyrect function.

Do you think a rendertotexture function is faster than copyrect?

More interesting findings you got there! 👍👍👍

btw, I did a decent benchmark which I posted on a new comment above as this nested replies is getting too long and narrow.. 😅

itch.io

Viewing post in God-Rays for Blitz3D comments