Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
(+1)

While I haven't actually written the Assembly code for it yet, I am working on a sprite rendering system test of some variety, to use in making games down the line.  Mostly posting this here as a to-do, and to share the design docs I have so far.  I'm (currently) targeting a 300kHz Custom setup with a 256x128 (8-bit by 7-bit) 512-color (9-bit) monitor, with 64kB of RAM and a 4MB harddrive.

A table of all the colors, as well as an easy means of finding their IDs.

The actual sprite format is as follows, fairly simple:

  • Number of addresses used for sprite (on the disk - they're always going to be hotloaded, for memory efficiency reasons)
  • Image metadata: first 8 bits are width, next 7 are height, next 8 are how many color IDs the image uses, while the last 9 are just ignored
  • List of color IDs, one per line, with each line being the actual 9-bit monitor color ID
  • List of pixel data, 4 pixels per line in a 1-dimensional array, each pixel being an 8-bit color ID.  Color ID 0 is transparent.

The pseudocode I wrote - of no particular language, it's just to articulate to myself what I would actually need to implement - for the actual render loop itself is as follows:

* Loop through in some form: estimated 21 cycles per loop tick that DOES NOT DRAW, otherwise 24
 *   Store both X and Y, use them in the loop.
 *   X *must* be a power of 2.  As must Y.
 *   That way, it's possible to evenly divide things & whatnot.
 *   Need to have a variable for "currentItem", a 0-3 int.  Used in figuring out which part of the current line it's on.
 *   Every time currentItem rises above 3, it increments the main counter, going down to the next line.
 *   -=LOOP LAYOUT=-
 *   PRE-LOOP: line = getdata(disk, registers, counter) //X=0, Y=0, counter=beginningOfImage, currentItem=0, registers[0:]=0, registers[4] = X,Y offset for extended SETDATA
 *   splice(line, registers[2], (currentItem*8), ((currentItem+1)*8)-1, (3-currentItem)*8, 1) //put the color ID into register 2 for comparison
 *   color = getdata(disk, global, id+1+registers[2])
 *   if color != 0 //make sure the pixel isn't transparent
 *   splice(color, registers[0], 22, 31, 9, 1) //splice the new 9-bit color in that we got from the color ID list.  goes from last 9 bits to first 9 bits.
 *   splice(X, registers[0], 23, 31, 17, 1) //splice the 8-bit X position in that we got from the incrementation.  goes from last 8 bits to bits[9:17].
 *   splice(Y, registers[0], 24, 31, 24, 1) //splice the 7-bit Y position in that we got from the incrementation.  goes from last 
 *   setdata(monitor, registers, 0, true, 4) //draw the pixel, with an X/Y offset in register 4 (the actual position the sprite was requested to be drawn at)
 *   endif
 *   currentItem += 1
 *   if currentItem >= 4
 *   currentItem = 0
 *   counter += 1
 *   line = getdata(disk, registers, counter)
 *   endif
 *   x += 1
 *   if x>= width
 *   y += 1
 *   x = 0
 *   endif
 *   if y>= height
 *   breakFunc
 *   endif

Next off, an example image's code, plus the image itself.  The formatting isn't...quite right for use in-game, but it gives a rough idea of what an image file looks like.

44 //# of addresses.  Equal to 1 + numCols + ((X*Y)/4).  ADDR 0
0b00001000_0010000_00001011_000000000 //X = 8, Y = 16, C = 11.  ADDR 1
019 //Colors[01] = 019, dark orange (30).  ADDR 2
030 //Colors[02] = 030, orange (30).  ADDR 3
093 //Colors[03] = 093, orange-grey (30).  ADDR 4
163 //Colors[04] = 163, dark lime-grey (90).  ADDR 5
236 //Colors[05] = 236, lime-grey (90).  ADDR 6
027 //Colors[06] = 027, black-yellow (60).  ADDR 7
100 //Colors[07] = 100, dark yellow (60).  ADDR 8
109 //Colors[08] = 109, mild yellow (60).  ADDR 9
054 //Colors[09] = 054, yellow (60).  ADDR 10
124 //Colors[10] = 124, lime (90).  ADDR 11
180 //Colors[11] = 180, lighter lime (90).  ADDR 12
0b00000000_00000000_00000000_00000000 //00, 00, 00, 00: 00,00 to 03,00 all transparent.  ADDR 13
0b00000000_00000000_00000000_00000000 //00, 00, 00, 00: 04,00 to 07,00 all transparent.  ADDR 14
0b00000000_00000000_00000111_00000111 //00, 00, 07, 07: 00,01 to 03,01 have helmet.  ADDR 150
0b00000000_00000000_00000000_00000000 //00, 00, 00, 00: 04,01 to 07,01 all transparent.  ADDR 16
0b00000000_00000110_00001001_00000111 //00, 06, 09, 07: 00,02 to 03,02 have helmet.  ADDR 17
0b00000111_00000000_00000000_00000000 //07, 00, 00, 00: 04,02 to 07,02 have helmet.  ADDR 18
0b00000000_00000110_00001000_00001011 //00, 06, 08, 11: 00,03 to 03,03 have helmet.  ADDR 19
0b00001010_00000000_00000000_00000000 //10, 00, 00, 00: 04,03 to 07,03 have helmet.  ADDR 20
0b00000000_00000110_00000001_00001000 //00, 06, 01, 10: 00,04 to 03,04 have helmet & arm.  ADDR 21
0b00001010_00000000_00000000_00000000 //10, 00, 00, 00: 04,04 to 07,04 have helmet.  ADDR 22
0b00000000_00000001_00000010_00000001 //00, 01, 02, 01: 00,05 to 03,05 have arm.  ADDR 23
0b00000001_00000000_00000000_00000000 //01, 00, 00, 00: 04,05 to 07,05 have arm.  ADDR 24
0b00000000_00000001_00000011_00000011 //00, 01, 03, 03: 00,06 to 03,06 have arm.  ADDR 25
0b00000100_00000101_00000100_00000101 //04, 05, 04, 05: 04,06 to 07,06 have cannon.  ADDR 26
0b00000000_00000001_00000001_00000001 //00, 01, 01, 01: 00,07 to 03,07 have arm.  ADDR 27
0b00000100_00000100_00000100_00000100 //04, 04, 04, 04: 04,07 to 07,07 have cannon.  ADDR 28
0b00000000_00000001_00000010_00000010 //00, 01, 02, 02: 00,08 to 03,08 have body.  ADDR 29
0b00000001_00000000_00000000_00000000 //01, 00, 00, 00: 04,08 to 07,08 have body.  ADDR 30
0b00000000_00000001_00000011_00000001 //00, 01, 03, 01: 00,09 to 03,09 have leg.  ADDR 31
0b00000001_00000000_00000000_00000000 //01, 00, 00, 00: 04,09 to 07,09 have leg.  ADDR 32
0b00000000_00000001_00000010_00000001 //00, 01, 02, 01: 00,10 to 03,10 have leg.  ADDR 33
0b00000010_00000001_00000000_00000000 //02, 01, 00, 00: 04,10 to 07,10 have leg.  ADDR 34
0b00000000_00000001_00000010_00000001 //00, 01, 02, 01: 00,11 to 03,11 have leg.  ADDR 35
0b00000010_00000001_00000000_00000000 //02, 01, 00, 00: 04,11 to 07,11 have leg.  ADDR 36
0b00000000_00000001_00000010_00000001 //00, 01, 02, 01: 00,12 to 03,12 have leg.  ADDR 37
0b00000010_00000001_00000000_00000000 //02, 01, 00, 00: 04,12 to 07,12 have leg.  ADDR 38
0b00000001_00000011_00000001_00000001 //01, 03, 01, 01: 00,13 to 03,13 have foot.  ADDR 39
0b00000010_00000001_00000000_00000000 //02, 01, 00, 00: 04,13 to 07,13 have foot.  ADDR 40.
0b00000001_00000011_00000001_00000011 //01, 03, 01, 03: 00,14 to 03,14 have foot.  ADDR 41.
0b00000001_00000000_00000000_00000000 //01, 00, 00, 00: 04,14 to 07,14 have foot.  ADDR 42.
0b00000001_00000001_00000001_00000001 //01, 01, 01, 01: 00,15 to 03,15 have foot.  ADDR 43.
0b00000001_00000001_00000000_00000000 //01, 01, 00, 00: 04,15 to 07,15 have foot.  ADDR 44.

Senbus.  Yes, I know, that's a terrible name.  I'm a programmer, ok?!

It's not a terribly efficient system - my estimates show the example character taking ~2.5k cycles to render, which at 60 FPS is already half the frame, or even at 20 FPS, roughly 1/6th the frame.  Even if a "room" was always 256x128 pixels, it could still easily take several seconds to render in all the entities, terrain, etc, to speak nothing of the time it might take to render large motile objects.  Go bigger than that (ah-la old Metroid games, Super Mario Bros, etc), and your performance simply disappears as soon as the room needs to scroll.  I KNOW there has to be a way to improve its efficiency, I'm just...currently a bit too daft to find it.  >.<

Some other ideas I can think of off the top of my head would be to use a list of individual pixels, if I had a max sprite size limit of 16x16, it would be feasible to make one that supports 2 pixels per line, in the form of CCCCCCCCXXXXYYYYCCCCCCCCXXXXYYYY, only rendering those individual pixels.  It'd cut down on some of the performance issues, since it outright skips transparent pixels, as well as the need to increment X and Y.  It'd be (in some cases) more space efficient (my math says the test character would take up only 38 addresses instead of the current 45), though it might lose out on some speed per loop cycle, it just compensates with having less cycles overall.  Image files would also likely read nicer.

Will update in replies if I make any breakthroughs.

(+1)

A sprite system? Wow, that would be really cool, and very useful for not just games but also other things that use graphics, e.g. icon-using window systems, or even for rendering text (using a sprite per char).


Regarding speed/efficiency, hmm. Well, let's do some math and figure out what our theoretical limits are, as that will help to discard options that aren't actually possible.

With a 256x128 screen, there's 32768 pixels, so (using the current screen device API) that's the theoretically minimum possible cycles a full-screen refresh would take.

At 300kHz, that means a theoretical maximum full-screen FPS of barely above 9.

In practice, unless the program is simply a long list of SETDATA instructions, it'll need to use more cycles than that - even my single-color-only fast loop needs 2 cycles per pixel, plus some more for setup for each frame.

If we assume that it will need at least 3 cycles per pixel to draw an actual image (which might still be low-balling it even as a best case), that's already down to about 3 FPS max - and it still doesn't include any of the processing needed to generate that image (such as game logic or sprite data decoding).

So, if you need 10 FPS or more for the game to be playable, full-screen refresh is simply a no-go even in theory, given this (virtual) hardware.

Thus, any game for this system that needs a decent FPS would have to be designed to not require full-screen refresh.

This also means that the basic way of doing sprites - always starting with a base image and then layering the sprites on top, and then blitting the result to the screen - is also a no-go. Even doing it directly to the screen instead of to an off-screen buffer would still involve a full-screen refresh (and then some more).

We'll therefore have to use something that only draws the pixels it needs to, to update the screen. Given a sprite system, that's not actually a major hindrance, since it would draw only the pixels of the sprites anyway, though we'll need to add erasing the previous sprites to that (or if the sprites move slowly, at least the parts of them that won't be drawn on top of.)


So. How many pixels can we actually do per frame, then?

Well, if we want 60 FPS, we only have 5k cycles per frame, so max 5k pixels - or somewhat more realistically, 5k/3 < 1667 pixels (which is less than a 41x41 square) - minus setup and game logic. So, at most ~5% of the screen can be updated each frame, even with that theoretical high-efficiency code.

At 20 FPS, we have 15k cycles per frame, which by the same logic gives us max 5k pixels (less than a 71x71 square), or ~15% of the screen, in that same very optimistic case.

Note that those numbers include any pixels that need to be erased (to show the background) because a sprite was moved away from there, and do not account for needing to handle transparency, do game logic, or any other overhead. So it's likely to be significantly less in practice.

So really, large moving objects (drawing over ~15% of screen size) at even 20 FPS are not really a matter of efficiency - they're simply not really possible on this hardware. Scrolling likewise unless it can be restricted to the same number of pixels actually changing. (I'll note that on some game machines, sprites/scrolling/etc. were implemented in hardware.)


Now, can we get around that limit?

Well, since the limit is on the number of pixels drawn, not where on the screen those pixels are, it might (depending on game and graphics style) be possible to have moving objects that cover a larger screen area - as long as only parts of it actually need to be redrawn for each frame.

E.g. if the large sprite is an enemy, if it's designed as a blob, with most of its internals being the same color, and the game knows this, then it can draw the whole thing once and then later update only its edges instead of the whole thing. (I actually did that in Snake, only drawing each end of the snake instead of the whole thing.)

I think that requires using special-case code instead of a generic sprite renderer, though, since the game needs to know about it to know that it can do it. Well... unless you encode that movement in the sprite data somehow and have the renderer use that for partial updates. But then I'm not sure if it's really just a sprite anymore... Although, I suppose it could be done indirectly by having several sprites - one for the initial rendering, then others for per-frame movement that only draw what needs to be redrawn for that movement. Though you'd still have to encode erasure areas somehow, unless the background is always the same color...


Regarding your algorithm pseudocode, I haven't studied it in detail, but noticed that it essentially has an inner loop for a small fixed-size set of elements (the 4 pixels per data line), on a system where inner loops (or multiple checks) is expensive (due to juggling of R2/R3), and there actually is a decent amount of RAM to put code in.

So, one easy way to optimize that would be to unroll that inner loop, instead having a copy of the inner drawing code for each of those 4 pixels. This saves both that R2/R3 juggling, and needing to calculate which bits to copy out (though that could probably be done by a bit shift instead), thus speeding things up - though at the cost of the code being longer. I do admit it limits us to sprite sizes that are multiples of 4, at least in the X direction, but I doubt that's much of a problem in practice.

However, your idea of instead using a list of pixels (including X/Y/C for each) would probably be even faster, not just due to skipping the transparent ones entirely, but because I think it would need less processing per pixel. With your proposed format, I'd say 6 cycles per pixel, plus 2 per data line, so 14 per 2 pixels (plus some initial setup per sprite, but I assume that's true for any method). Some games might shorten it by a couple cycles, though (i.e. if sprites can only be placed at multiples of their size).

setup:
    R2 = addr of first image word
    R3 = addr of last image word + 1
    R4 = X/Y position of sprite, encoded as pixel location for sprite's (0,0)
loop:
    R1 = read(R2)
    R2++
    two copies of this code, one for each pixel:
        R0 = 0
        PMOV color bits from R1 to R0
        PMOV X position from R1 to R0
        PMOV Y position from R1 to R0
        R0 += R4
        draw(R0)
    IFJMP to loop if R2 < R3


If even more speed is needed, and disk space can be sacrificed to gain it, we can easily get down to 4 cycles per pixel (including the data line reading) by using the very simple one-pixel-per-data-line format of your win screen image, and just adding the sprite location to each pixel (like above).

If there isn't enough disk space to store all the sprites that way, but only a few of the sprites are needed at the same time (e.g. different sprites on different levels), then one option to consider would be caching - setting aside an area of disk for the currently-used sprites, and decode from a compressed image format into that area before starting the level. This would increase the loading time of the level, but in return make it render faster, so you can do more while in the level.

It's actually possible to optimize this down to 7 cycles per 2 pixels if the screen size and color depth were changed to something that fits in 16 bits, as then it would be possible to just shift the second pixel into place with one PMOV instead of having to read and increment. But, that would mean making the screen smaller and having fewer colors, and if we do that, many of the calculations/conclusions above would have to be redone (since the hardware is different).


One thing I'll note is that none of these options have actually managed to reach 3 cycles per pixel, which suggests that the calculations above are actually on the optimistic side.

If we redo with 4 cycles per pixel, then 20 FPS works out to 15k/4 = 3750 pixels (less than a 62x62 square), or ~11.4% of the screen. Again, minus setup, game logic, etc.

Assuming your sprite is representative with drawing 72 of 128 possible pixels in its grid (57% fill), and assuming just 4 cycles of setup for each sprite (to make the math easier), that suggests a max of 3750/73 ~= 51 sprites can be drawn per frame, using the most time-efficient option we know of for this hardware.

Even if we have to erase all of them first, that would still leave us with 25 sprites per frame, which isn't too bad - many games can be done within that.

(Though the game logic will probably take away a good chunk of that, making it fewer, but maybe not all of the sprites need to be moved/redrawn every frame, either, which can compensate.)