itch.io is community of indie game creators and players

Devlogs

A short story of C128 ports

Amaurote 64 & 128
A downloadable game

Right after  Amaurote 64 release people mentioned that C128 80 col mode would be a perfect platform for this game. I have contacted  Brush and said that although I won't complete such port, I can have a look.

That was about a month ago and I couldn't be more  wrong. There is not just one C128 port, but two versions of the game. One for VIC (40 col) and one for VDC (80 col). VDC is self-explanatory: it's supposed to run in 2MHz mode, but what can be so special about VIC version?

Well, C128's MMU has features that, when combined with some careful coding and fortunate choices of the original designer allowed me to significantly speed up the VIC-based game too. The 2MHz mode had nothing to do with it - it's already used in Amaurote 64 if you run it on C128 in C64 mode. MMU can remap zero page and CPU stack into any other memory page in the whole 128K area. If you respect page boundaries and check for $00/$01 accesses (because CPU port is still there), then any page of RAM, in any bank, can temporarily become zero page and can be accessed using fast addressing mode of the CPU.

There were two very fortunate choices in the original game design that allowed for speedup using zero page mapping. There are 256 pixels in a line. Every line takes exactly 32 bytes and 8 lines fit nicely in a memory page. Furthermore, there is a frame around the game area that covers the first and last 16 pixels. It takes away four bytes in every line and eliminated most of checks for possible conflict with CPU port. On top of that I had the luxury of having extra 64K of RAM. The obvious choice for optimisation is to trade off space for speed and unroll loops. With 64K in in another RAM bank it was possible to do these unrolls on a huge scale. There is no indexing at all in the most time-consuming procedures. All the indexes and offsets were computed in the assembly step. For example, copying data from screen buffer into VIC screen is combined with transformation from linear bitmap into charsets and expressed by thousands of load/store operations with absolute addressing. All those bytes are stored into zero page, constantly shifting its underlying location.

On every screen refresh a hexagonal frame is overlaid on top of game area. Since the frame shape is fixed, its gfx data and bitmasks are predefined in the assembler and every data access is either an immediate value or a zero page load/store, all with absolute addressing.

The C128-VIC port uses over 32K of extra RAM only for speed code. I'd say that it's the first C128 game that combines MMU features: extra memory and zero page mapping this way. The VDC port has its own tricks too. It uses 256x200 monochrome bitmap mode with doubled pixels. On the real hardware (VICE timings are off) this gives much more time for CPU to access VDC RAM: there are fewer columns and no attribute data to fetch.

VDC port benefits a lot from having screen with exactly the same layout as the game screen buffer. Screen copying and sprite rendering is trivial with no translation necessary. In this version I could briefly disable IRQs and more speedups that use stack remapping were possible.

Even with all the optimisations, access to VDC RAM remained a bottleneck. You can notice it when more objects appear on the screen at the same time. This is the reason why the radar icon doesn't swing: I had to save some CPU time.

What is black & white for me may be shades of gray for you, so feel free to use keys 1, 2 to switch screen colors while playing VDC port. Finally the intro. C64 intro displays hires bitmap with a layer or sprites for additional color. It turns out that there is a mode for VDC with 64K RAM that has exactly the same color restrictions: VDC FLI 8x1, 480x252. In order to display that mode I have to switch one VDC register value just before and just after vertical blank. It's completely out of sync with 50Hz PAL. If it would overlap with music IRQ I could miss those events completely, resulting in a flickering screen.

Solution to this was to roughly estimate how often VBLANK happens, measure how long it is, setup NMI for this purpose and let things automatically calibrate. Behind every story of a success there is an even longer story of failures. I had a lot of failures and hit many dead ends trying to improve VDC RAM throughput. This involved using REU to stream data directly into VDC data register, one write every clock cycle. Or using lightpen registers to latch current row and estimate how much time is left until the next vertical blanking period.

I already had a short and elegant solution to push bytes into VDC faster. It turned out that it works only on emulators, not on a real machine. I also tried to keep track of 'dirty' rows, to check if rendering a lot of sprites at different Y positions can be replaced by full screen refresh. It didn't work well. So this is how I spent several evenings and nights of April 2022. I hope that by now you can see how excited I became about this project.

It was a challenge to myself to demonstrate how obscure and rarely used MMU features can be very practical. This C128 port is not just the C64 release with a different loading address. There is some additional quality to it.

I would like to thank Oziphantom for his valuable input.
Special thanks go to Tokra. VDC FLI 8x1 intro picture wouldn't be possible without his work on hacking VDC graphical modes.
Big thanks also go to participants of https://c-128.freeforums.net/ forum. You have gathered incredible amount of knowledge about inner workings of VDC and C128s.

Thank you all!

Maciej Witkowiak, a.k.a. YTM/Elysium

Files

  • amaurote-c128-vic.d64 170 kB
    May 29, 2022
  • amaurote-c128-vdc.d64 170 kB
    May 29, 2022
Download Amaurote 64 & 128
Read comments (5)