I also created a kind of 16-Bit Assembly (sorta limited upwards by the ROM capabilities cuz it's very inconvenient to have to look through 2 ROMs for 1 line of code, otherwise I'd have made a 32 bit CPU) which has the built-in capability to handle I/O ports.
1 designated IO bit in the assembly setup, which (when activated) replaces the Y input of the ALU with Port input, as specified by bits 8-e (from 0-f in hexadecimal).
I have constructed a version of Assembly which can handle up to 128 different IPO (Input, Processing, or Output) Device Hardware IDs... and those could be built-in as well!
The only problem is that I have to decide which is more reasonable: Having 128 different IPO devices connected ONCE to the CPU... or having 64 different IPO devices connected at most 2 times to the CPU.
With 128, I could have a looooooooad of different devices. With 64, I could get 2 GPUs...
But I'm not even multithreading this CPU or whatever.. it's got 1 core... and I should start learning multi-processors before I continue my work. If I get all this figured out, I might upload a video showcasing it all.
If I had 32 bits to address, I could basically get a supercomputer that runs on 32 bits. in't that fun?