Post by CliffracerX in Code sharing megathread

Viewing post in Code sharing megathread

While I can understand the allure of using registers, especially for simple operations (int X = int Y, for instance), I figured RAM might be better overall since it means you can have as many "arguments" as you need, especially if you get into custom datatypes (like a struct) that could easily balloon past 7 or 8 registers being required for even a simple X=Y op. That is, however, coming from my own implicit biases, and lack of optimization skill, so I don't see it as being HUGELY unlikely that I'm wayyy off the mark in how best to do that. (Also - you almost certainly know way more about what goes on under the hood of modern computers than I do. Knowing how computers work is probably the more important factor for compiler development than having a tiny bit of research into how compilers work. :P)

MOVR basically seems like a one-off to get out of needing to write a bunch of self-modifying code just to move one memory address' contents through RAM - while it can be used for some small improvements here and there, it's not necessarily worth taking up a whole sub-code slot for. Your suggestion for alternate MOVI/MOVO modes is probably the more practical thing to add, even if it means an extra clock cycle or two.

OFST is definitely worth keeping - it's all-around useful, and in the case of the multitasking kernel I wrote a while back, mandatory for functionality. Far as I can tell, there's no way to make a program that'll live at an offset without OFST - you can't just self-modify the MOVI/MOVO ops to splice the offset in, because self-modifying code would need that offset in the first place.

(Also, now I'm tempted to make an OFST variant for Custom Mode that lets you set up your own MMUs. The code-creep is getting out of hand, there're just too many cool things to add!)

My thinking for using that separate mode is that it means the current feel (and perhaps charm) of the TC-06 can be preserved with ease, while also easily enabling access to a more practical processor (or less practical, if you so desire - experimenting with an online circuit simulator & making a byte of RAM inspired me to make a byte-based lower-fi TC-06 derivative, the specs for which will go up on the Gitlab at some point soon) for doing fancy stuff.

I'm aware Senbir is my game, but I also feel like it has a certain charm to its terribleness, and I'm worried adding a new-opcode (e.g, "RAMC <4-bit sub-code for practical RAM-based variants of all the main codes> <24 bits of arguments for that sub-code>") would potentially mess that charm up, or disturb the overall feel of the TC-06 & how it works, hence asking about it - don't wanna ruin it for my playerbase! I do definitely want to look into making a more practical RAM-based architecture with less reliance on registers (maybe use them more as op-code arguments/variables than they are now?) down the line, just to explore what it'd be like.

(Also: no, TC-06 isn't really an acronym for anything. It's sorta loosely based around "The 006 Cooperative", but mostly just serves as a cool abbreviation-sounding processor name. :P

Also, sorry if this post is a bit discombobulated, it's six in the morning. Am not as smart as I should be/could be.)

edorfaus5 years ago (2 edits)

Regarding MOVI/MOVO variants taking more clock cycles than MOVR, well, not necessarily, because they're more flexible and so can be used to do more parts of the job.

Consider the first version of your SET operation, with R0 pointing to the arguments - your version with MOVR takes 14 cycles, while with the MOVI/MOVO variants it could be done in 5, without self-modification (thus not needing OFST), and using one register less:

MOVIR 4 0   // R4 = mem[R0] : read the source address into reg 4
MOVIR 2 4   // R2 = mem[R4] : read the data from the source address
MATH 15 0 0 // R0++         : increment argument pointer
MOVIR 4 0   // R4 = mem[R0] : read the destination address into reg 4
MOVOR 2 4   // mem[R4] = R2 : write the data to the destination address

If the MOVI/MOVO variants supported both register and immediate at the same time, it could be done in 4, since then R0 doesn't need to be incremented.

For version 2, though, you're correct that it would take an extra cycle - though that assumes we're not counting the cycles required to move the pointers into those particular memory locations from wherever they were originally. That may or may not be reasonable, depending on the larger context.

(Btw, unlike MOVR, MOVIR/MOVOR could be used for your ADD operation as well.)

Of course, that flexibility also means they remove far more of the need for self-modifying code, so it depends on what you want the game to be - MOVR is somewhat less of a change in that respect.

Regarding taking up a sub-code slot, that's not in itself a problem, since there's really lots of space available - worst-case, you can not only do sub-sub-subcodes, but make multi-word instructions. (As in, the instruction at address N takes arguments from address N+1, and sets the next address to be executed to N+2 instead of N+1. Those arguments could even themselves be subcodes.) Consider e.g. the x86 family - it has 1-byte instructions, but from what I've read, it also has instructions that are up to 15 bytes long. (But then, it's got thousands of instructions - exactly how many depends on how you count them...)

Of course, an indirect problem with that is having to implement and support all those instructions... I'd say that would be a better reason to not add them without a decent reason for having them.

Regarding functions using registers or RAM, having many/large arguments, etc. - one thing to remember is that there's a major difference between language-level functions and the kind of compiler-operation-level functions I believe we were talking about here.

The language-level functions are the kind the programmer specifies in the higher-level language they're writing in (C, C#, whatever), with whatever parameters etc. they specify, and executing whatever code the programmer wrote - they're a part of the source code that is the input to the compiler.

At the CPU level, execution of these functions is generally started by a jump from whatever code called them, and once they're done executing, there's another jump to return to the calling code. Where to put the return address, arguments and return value (if any) for the function differs between architectures and languages and is part of what's generally termed the calling conventions of that architecture/language. Here, using RAM for arguments often makes a lot of sense (often in the form of a call stack).

The compiler-operation-level functions, on the other hand, are (I assume) used by the compiler to generate the assembly code that implements a given language-level function - and so are defined by the compiler itself (fixed parameters etc.), and basically return some assembly code that implements the operation that particular function is for. I'd say they are probably more accurately thought of as templates, rather than functions, since they don't directly execute the operation, but return the code for executing it (to be put into the generated program).

At the CPU level, in the program the compiler generated, those functions (or rather the blocks of code they returned) are generally reached without any jumps, and don't jump when they're done; instead execution simply falls through from one operation to the next, since the compiler placed copies of their code into the output, one after the other, in such a way that the combination does what the higher-level language described. (Doing it with function calls (jumps), while possible, would be much slower and would probably end up taking more space.)

This leads to a fairly obvious optimization, where if the code for one operation loads a value from memory into a register to do something with it, and the next operation also needs that same value (or the result of the first operation) to be in a register to do something with it, then it's rather wasteful to load it from RAM again, since it's already present in a register. (Whether that value came from an argument to the language-level function is rather immaterial at that point.) However, that does require some careful tracking of exactly which value is stored where. (This is the optimization I meant decent compilers typically do, as it's a relatively easy and effective one - though not quite as important for the TC-06 since its RAM is very fast.)

By the way, at the compiler-operation level, I'd guess there's probably no such thing as a struct - by that time, the compiler has probably already transformed operations on a struct into operations on either its component fields or the block of memory that holds it, depending on what is being done with it. At the CPU level, copying a struct (instead of just the pointer to one) generally means copying either its fields one by one, or (often quicker/easier) the entire block of memory that holds the struct (which can be done with a tight loop) - and the compiler knows this and generates the assembly code accordingly, based on the low-level operations it has available for the target architecture.

Regarding OFST and programs living at an offset without it - this can be done by pushing (some of) the responsibility into the program loader. (As an example, I know that GNU/Linux does something kinda like this, though the details are different of course.)

Basically, your kernel (or the OS around it) necessarily has some piece of code that loads a program from disk into memory to be executed, right? And that loader necessarily knows which offset into memory it is loading that program into. So, in addition to loading the program into memory, that loader could also update (parts of) the program's code to work at that offset, and tell it what that offset is so it can do whatever else it needs to do to adapt itself.

As a simple example, the loader could update every MOVI and MOVO instruction it loads by adding the offset to the address stored in that instruction - and then set a register, let's say R14, to the offset before starting execution of the program. That would make the MOVI and MOVO instructions already point at the right places, and if the program does any self-modification, it can use R14 to adjust how it does that - and since the MOVO instructions are already updated, it can easily save the other modified instructions in the right places.

Of course, a fancier loader could also update any other instructions that are known to contain absolute memory addresses, or the program file could contain a header that tells the loader to update specific other locations as well (e.g. DATACs) that it can't auto-detect - but I think updating MOVI/MOVO is enough for things to work.

Technically, R14 isn't needed in this scenario - the program could MOVI one of its MOVI/MOVO instructions and grab the offset from that - but it makes things easier, and the loader probably has the offset easily available anyway, so why not?

Technically in the other direction, I think that if the offset is given in a register (or PCSR 0 does what I think it does), then even just updating a single MOVO would be enough (could even be on a fixed address), but that would make things harder on the program. It's probably better to make the loader do more, to avoid having to duplicate that effort in every program.

Come to think of it, another solution would be to have an OS function at a known fixed absolute address, that basically does a relative MOVO - call it with a relative target address and return address, and the OS function adds the offset to those addresses before performing the MOVO operation and returning to the program. If given an equivalent MOVI-performing function, or some way to find its offset (whether R14 or PCSR 0), I think it can do everything it needs to. Though in a somewhat more complicated (and slower) manner from the program's point of view, so the other solution is probably nicer.

Regarding the MMU, I would suggest adding an MMU device (for a GETDATA/SETDATA port) rather than adding CPU instructions for its functions. Mainly because that makes it easier/more sensible for some game modes to not have it, or for custom modes to have different models that work differently or have different APIs, without the CPU's instruction set having to be different. (Though also because I see it conceptually as a separate device that happens to sit between the CPU and the RAM, even though most (but not all) CPUs these days apparently integrate it into the same silicon chip.)

Based on just a little bit of googling, it's apparently a pretty complex device, with enough possible options and variations (segmentation vs paging, page sizes, PTE levels, APIs for process switching, etc.), that I don't think we're likely to get it perfect first try - if there even is such a thing as perfect here. There are trade-offs involved. So being able to experiment with different variations with otherwise equivalent computers would probably be nice.

(One tutorial I found basically said that, like everything in programming, the only way to really understand it is to play with minimal examples - but that the minimal example for using that MMU is rather large because it involves making your own small OS... On the other hand, a tutorial for a different MMU was much smaller and simpler, so it apparently depends.)

It sounds to me like you rather like the original ISA, in part because of its limitations, and therefore don't really want to loosen those limitations up much (if at all) - but at the same time, want to be able to do more advanced stuff without having all of those limitations making it extra hard. Which suggests that having them be runtime-optional is probably the best solution. And implementing that as separate modes (or options for custom modes) sounds to me like a good way of doing that.

I may be biased in that assessment, though, as personally, I think that the limitations are part of the challenge, and that it probably wouldn't be quite as fun to do those early levels (or some of the other things I've done) without them, as it would be rather too easy. But at the same time, I agree that they can get rather annoying when trying to do more complex things later. And thus, that it would be nice to have options to enable choosing the appropriate difficulty for any given idea. (Also, I tend to go for providing options and flexibility in general, making frameworks/toolkits more than products, in part to avoid having to choose for others. This tends to cause a certain amount of overengineering on my part... Like designing my simulator so that the CPU's instructions can be replaced/reconfigured individually...)

(Also, I'm rather weird in several ways, and not really a people person - so I don't really think I should try to speak for players in general.)

Come to think of it, something like this could also be integrated as part of the main game - essentially having things be unlocked as the player goes along and reaches more difficult levels, kind of like achievements.
E.g. after finishing some basic screen manipulation levels, "Re-reading the documentation with your newfound experience, you find you now understand a part you didn't before - which shows that there's a way to change the resolution and color depth of the screen." and continue with some levels that use the higher settings.
Or when you start needing better instructions, "You suddenly notice that some pages in the manual had gotten stuck together. Carefully peeling them apart, you discover that there are some more instructions that you didn't know about before."
Or maybe, "You discover a small box tucked away at the back of the computer. It turns out to contain a memory upgrade and a better processor."
Or "Studying the internals of the computer, you notice there's a jumper marked "debug mode" on one side and "normal mode" on the other - currently set to debug mode. After changing its position, the computer starts to run much faster!"
Etc.

Of course, that kind of depends on thinking up some more levels that would make for a decent progression...

Eh, no worries. This post is probably no better. *Looks at clock* ... ~8am... Ok... time to go to bed I guess... Just gonna post this first... *shakes head* (... ok, ready to post finally... *checks* 08:45... *sigh*)

itch.io

Viewing post in Code sharing megathread