Nice submission!
Is this a custom HTML engine of yours? I poked through the code a bit, and that seemed like the case. That’s very cool to me!
The text-to-speech was too slow for my liking. I would recommend adding a way to adjust the rate of the speech utterances.
So that led me to trying it with NVDA. It’s usable, but there are some issues that could probably be resolved with an afternoon of testing. The main issue is that updates to the text aren’t read aloud. You need to tab back to the speech toggle, and then read the text line-by-line with the arrow keys. I would recommend making the .scene__text region focusable with tabindex="0" and .focus() it when its contents are changed, or try using aria-live="polite" on it.
The actual story itself? I got distracted writing this, and should report back later.