Viewing post in God-Rays for Blitz3D comments
Thanks. While it still is too much centered for an effect that comes from all sides (which is unlogical anyway, when you look at the shading of the trees), it could be faded in and out dynamically when one is looking at the actual sun, or where the sun would be. The effect adds only a few hundred Tris to the scene, and they are blended in in add-mode, can be put in front of everything with EntityOrder, so this ray-mesh isn't the bottle-neck. One thing that is slow is to copy this render to the texturebuffer, even with the 256-flag. Render to texture would be faster. I think I remember I was able to do that using the FastLib for Blitz3D. The other burden is the render itself. This can be optimized by hiding the grass and maybe the bushes temporarily. And of course, as mentioned in the code, using EntityColor to turn the trees and the ground black during this render, rather than using WritePixelFast as in this demo (which I do in the experimental version of relic hunter). However, in the end of the day it really is an additional render, just like a cubemap would be, or some other fancy stuff, so we have to take the polycount into account. That said, when I first tested the trees without LOD, I threw 600k tris at my cheapo onboard card (AMD Radeaon R3) and it rendered it like a 1-cube scene, I guess 60 fps. So I'm getting a bit wasteful in terms of polycount. Maybe 1000 Tris for a tree is still too much (could be made less dense in the pine tree code). I guess DX7/Blitz3D still forces us to do more lowpoly as it lacks of certain FAST features, like shaders, automatic LOD, render to texture etc. But I can't help myself, I had unity installed, and mildly put, I didn't like it. Also, these days there are so many engines out there, I could spend my whole life just to test them, and from past engine-tests I know it usually gets you nowhere. Still, a Blitz3D to WebGL converter would be very nice.
Oh and BTW (also sorry for the text avalanche), as I'm thinking about it, I may as well try an entirely different approach that uses only 1 render: do a point sample mini copy of the main render from the backbuffer, eg. 256x256 pixels, then scale them down to 128x128 while smoothing the point samples (blur-shrink), probably using inline assembler (like you can in gfa basic, and probably freebasic that can make it a dll that then can be used in Blitz3D via userlib decls). And when reading the point samples from the render, ignore anything that isn't bright enough (or lower it to rgb 0).
Ok, in case anyone is interested: I tried the above idea (not the ASM part) and two things became clear: first of all, doing only one render inevitably causes a recursive feedback, because the points sampled are brightened and sampled again repeatedly, forcing me to use a very low alpha, but even then it stabilizes only due to rounding errors, causing it to flicker wildly. So I concluded there is no way around a 2nd render.
However, I found a much faster way: Render the scene without the rays mesh, full display size. Then do the point sample from the backbuffer and move it to the ray mesh texture. Then move the camera 10000 units away, where the entire scene is out of rendering range (the ray mesh is parented to the camera), set the cameraClsMode to maintain the backbuffer and now render the ray mesh alone ontop of it. Then set CameraClsMode to 1,1 again and move camera back to the scene. I was able to lower the rendering time of the effect from 23 to 16 ms - still very slow.
That's when I figured out the second thing: from the 16 ms about 13 ms were used only by the commands lockbuffer backbuffer() and unlockbuffer backbuffer() ! I tried it with no fastpixelreading, it took 13ms, then also without lockbuffer and it went down to like 1ms.
So the main bottleneck seems to be lockbuffer. It seems to wait for some green light from directX, which is in sync with the system framerate. I tried VWait right before lockbuffer and was able to lower it from 16 to 5 ms. But VWait should always be followed by flip 0, if used at all. Maybe I'll upload the source.
Very interesting and cool insights you got there, as always! I'm curious about the freebasic or inline assembly way to make it faster as I would presume this is how FastExt does this effect.
There is also this one idea that I am very interested with the outcome from Fredborg which RemiD described before that you might look into below. I guess you might use some form of light trails effect for the rays and perhaps you can have a go at it! ๐
"the idea was to have a subdivided quad parented to the camera, have its vertices colored with the sun color, and use linepicks from the sun to each vertex, and set the vertices alphas accordingly (if a light ray can reach a vertex, alpha 0.5, if a light ray can't reach a vertex, alpha 0)
with blendmode add or multiply2..."
Maybe I'm wrong, but as far as I remember Blitz3D doesn't support Render to Texture. The addition of Flag 256 allowed for faster access ("Store texture in vram"), but no direct render to texture. Copyrect from backbuffer to vram-texture is rather fast tho. Lockbuffer is slow and the actual problem. A way around would be:
render actual scene, without rays.
render mini version 256x256 and copyrect to texturebuffer (optimized with details like grass hidden)
render rays mesh alone ontop
This 3 render method does not allow for pixel access, so I'd use EntityColor 0,0,0 for shadow casting things, like in the source. Most important: no lockbuffer required. That should lower the 16 ms to 2 ms. The beauty of extracting the texture from the first render using readpixel etc. was, that its speed is independent from scene complexity, but well, it required lockbuffer, so..
edit: I just tried that, not significantly faster, unless I did something wrong. Then there was also a hack, to directly peek/poke VRam and Buffers, I guess using the memory.lib. Most likely not very stabile.