Several weeks ago similar idea came to my mind. However, instead of full game, I made single challenge. I was thinking about how to measure complexity of algorithms including all possible dependencies (libraries, OS, CPU) and subleq appeared as ideal method for it. Subleq is relatively fast and still very basic.
For testing of such idea, I selected lossy image compression task: it is easy to check if resulting image is good and it gives lots of possibilities for implementation of different algorithms and optimizations. Because image data occupies large amount of RAM, I selected 32 bit address size and 32 bit data size.
If someone wants complex challenge (amount of code + data for solution is ~50k-200k), I suggest to look at it: