performance of a custom engine on a standalone vr headset


When I started to work on Tea For God I had no idea that in a few years, there will be a standalone VR headset and what will be inside it. I decided to go with a custom engine as both Unity and Unreal seemed to require lots of additional work to handle the portal-based world I imagined. Unity could do it but only with some hacks and lots of tinkering - the existing impossible-space based games are often placed in a single space with portals spread all over the place. Sometimes there are just floors on top of each other. Unreal could do it in a more sophisticated way and I could incorporate portal rendering into the engine but any updates to the engine could result in lots of additional work in merging my changes. And rendering is just one part of that. There's also sound, physics, AI, scripting. Of course, a custom engine required me to implement lots of stuff on my own but I only add stuff that I needed.

One of the things I wanted to have in my engine was multithreading. I wanted to spread all the workload among the numerous cores that PCs have now. I already covered this topic a while ago.

When Quest was announced, I learned that it is going to have 8 cores. "Yay!" 4 of them will handle tracking. "So there's still 4 left!"

Four cores seemed to be enough. When I made the early version of the game public it was struggling to work at 72FPS but was still okayish. And then one system update came that wrecked the framerate. Turned out that my engine using 4 threads and the way they were being handled was hogging the CPU disallowing anything else to happen properly. And realised that there were actually more threads created by my game - the sound system used another one. Headset's OS righteously was deciding to give some time to other threads so all the work could be done. It was up to me to leave some space to allow other parts of the system to do their work.

I switched to three threads but with some fiddling, I managed to offload some work onto a rendering thread.

Now, a few words about how my engine works with threads. There were three threads then:

  1. Rendering thread that was building render scenes and then encoding commands for GPU. When it was done with processing the scene to render it was helping the main gameplay thread. This thread also was communicating with OS to handle events and was doing some synchronous work (high-level world management)
  2. Main gameplay thread that was doing all the gameplay-related jobs. When I went from 4 threads to 3, I also squeezed it in here handling of the sound scene (which sounds are heard, where to place them relative to the player etc).
  3. The last thread was responsible for the creation of the world, generating meshes, the rooms, building navmesh, finding nav paths and a few more things. When it was idling, it was helping the main thread to allow the CPU to work at a lower frequency.

It was working okay but every now and then I had hitch frames. Frames that were taking longer to process. I did lots of improvements, making sure that the frame in most of the places stays below 50% of the allowed time. I needed this to be so low to have extra headroom for more action.

But the hitch frames were still there. The thing became worse when I started to port the game to other VR headsets. 

I found out that some random jobs were stalling. They were taking 9ms or sometimes 20ms or sometimes even 90ms. I couldn't find a cause in the code. Sometimes they were quite simple tasks that should not take this long. It puzzled me. I had no idea what was going on


I decided to look into what is happening with the threads themselves.

Now, a few words about multithreading in general and how it works on PC. Various threads may have different priorities. If you have some time -critical stuff to do, you give the thread a higher priority. If something may take more time, you may go with a lower priority. Windows is quite flexible when it comes to priorities.

Standalone VR headsets use Android and with both, Linux and Android, things are a bit different. There are a few parameters that describe a thread. There's "nice" level, how nice is a thread to other threads, if it is nice (higher value) it allows other threads to use more time. If this value is low, it wants to take more time. Then there's the priority but priority works only with a certain type of thread. That type of thread is a real-time thread. The scheduler policy tells what kind of thread we have. Real-time threads are meant for tasks that shouldn't be paused. They should work at full speed as much as possible. Other threads could be considered background tasks. They could be paused for a few milliseconds or a few dozen milliseconds.

Another thing is that only privileged tasks/processes could change the scheduler policy to real-time. This is to avoid situations where the CPU is hogged by a few processes that fight for time which results in sometimes crucial processes starving and stalling.

When OpenXR was introduced, a bunch of extensions to it were added. One of them was designed especially for operating systems that have strict rules for multithreading. Such an extension was created to allow turning threads into real-time threads without giving extra privileges to the application.

At the moment of writing this post, Meta/Oculus implementation allows only two threads to be real-time.

Let's get back to how my engine uses threads:

  1. rendering/system thread is a real-time thread
  2. gameplay/sound thread is a real-time thread
  3. the additional thread is not a real-time thread

Then it hit me. The jobs that were stalling were running on the third thread. Most of the time I was lucky it didn't affect the framerate but every now and then that additional thread was paused.

The solution was easy - the additional thread should only do background stuff. 

But... The game still managed to stall. It was happening less often, though.

I checked my performance tool and it wasn't a game job which was a relief. It was one of the background jobs but quite a particular one.

Another small explanation. A frame in my engine can be broken into three phases:

  1. Non-physical advancement. The game world does not change at all. All objects remain in the same place, nothing appears or disappears. During this phase render and sound scenes are built and AI is advanced (as it merely gives out orders to other systems).
  2. Physical advancement. The objects may move in the game world. They can change their placement, room, animate and so on. But the rooms remain as they were, no new objects are created or destroyed.
  3. Synchronous actions. Anything can happen as long as it happens on a single thread. This phase is rarely used during a single frame. This is the place spot to add new rooms and doors, connect them or spawn new NPCs. This is also a rare situation that something can block everything else. Hence the actions have to be short and as singular as possible.

Background tasks may create new places and lots of the work goes into generating meshes. Once something is created, it has to be put into the world. This happens during the "synchronous actions" phase. But as it was being called from the background tasks thread, this thread could be paused by the operating system blocking everything else.

The solution? Background task creates a job and waits for its completion. This is a bit of additional work but as it is happening on the background thread we don't really care too much as it is just a little bit of extra work. That job is done by the gameplay thread when it is not advancing the world or creating the sound scene. As it is a real-time thread, it will do the job really quickly and get back to waiting for more jobs (new synchronous or gameplay loop, the sound scene building).

So far so good, right? It should be all now working nicely, shouldn't it?

Unfortunatelly no. The rendering thread was struggling with pushing all the jobs to the GPU. And after a short investigation, I saw what was the problem. Too many changes to shaders, too many setups of vertex buffers. This is something that is enforced by Vulkan (although I expect that you could abuse it if you really wanted to) but as I am still on OpenGL, I had to investigate what can I do about it.

RenderDoc came once again to the rescue. Shaders were changed so often as there was a misuse of one functionality that was reverting some of the parameters and then setting them back again. This was a quick thing to find and fix.

But the rendering setup got me going in circles for a while. I realised that I should go with Vulkan but I started to work on the engine before Vulkan was there and then there was never a good time to switch. I noticed that I haven't used all the tools that OpenGL has to offer, here namely Vertex Array Objects (VAO).

I've been using Vertex Buffer Objects. VBOs contain all vertices that are required to render a model. Each vertex has position, normal and a few more information (colour, texture etc). VBOs can also be used to hold information about which points are used to create triangles that are rendered. The problem here is that if you use VBOs only, you still have to set up a few other things.

When I implemented VAOs, the number of calls went down a lot. And the framerate stabilised. I still may get occasional hitch frames but these should be handled case by case and no longer considered a general issue - these could vary from "a place with too much to render" to "too many objects advancing". One of the solutions I want to try is to temporarily boost performance by forcing the CPU and/or the GPU to jump into the highest level for a brief moment. Most such moments happen in hand-made scenes that last for a few seconds.

There are also a few more lessons to take from the problems I ran into.

  1. Go with Data-Oriented Design. I haven't. Although whenever possible I choose now to follow DOD. The games on standalone VR headsets should consider that only one thread will be doing gameplay and there's no reason to go with multithreading. Only some background jobs can do that but the bulk will be done on a single thread and it is the top priority to make the game run as fast as possible on a single thread. DOD helps achieve that a lot. 
  2. Even though lots of tools translate nicely to different platforms, each system has its own quirks and issues. I underestimated how important they are as I thought that multithreading on Android will work similarly to how it is on Windows.
  3. This one is more of a reminder: if you run into a problem, investigate first. Investigate carefully and if something doesn't make sense, try to find a reason why it makes no sense.

There's one extra thought I have for myself. I will have to switch to Vulkan one day. And it might be sooner than I anticipated. The reason is something a bit unrelated to performance. I played a bit with PSVR2 which has Foveated Rendering that uses eye tracking. It seems to be impossible to detect while playing but the reason is that it fluently follows the eye. With tile-based rendering that standalone VR headsets depend on, you have to either depend on the tiles and foveated rendering switching the whole tile's resolution or use an extension to offset tiles making them always relative to where the player looks at. And that extension is not available if you use OpenGL.

Get Tea For God

Comments

Log in with itch.io to leave a comment.

(+2)

Phenomenal write up! 

(+1)

Thank you!