Three key points that might help:

  • Try to use consecutive memory accesses as much as possible, to maximize cache coherence.
  • Avoid floating point like the plague - even on DX machines.
  • Allocate memory upfront and try to recycle it as much as possible.

EDIT: it doesnt work on my 486DX 50mhz, 12 MB RAM. In my Raspberry Pi 3, with DOSBox (my main development machine) it's stuck on the hourglass (but to be fair, my game runs quite slowly on that machine as well).