"Where did the pixel art in the generator’s dataset come from?"
That's not how they work. The training data is off somewhere in a lab. By the time you are generating stuff, the AI has a fully formed neural network.
Think of it like this (metaphorically but also literal, functionally speaking):
- Your stuff got scraped.
- It was ripped apart and "chunked" into a special format that the AI brain can comprehend
- It was fed to the neural network, which magically understands it (not BSing you, this was not programmed in, kind of a big deal no one talks about)
- Then it was able to be "trained" through positive and negative reinforcement feedback because it somehow magically understands the concept of positive and negative want/desire/whatever you wanna call it.
- When a picture of your scraped art looks like a picture of your scraped art, it's ready to go.
- That AI is copied, and it's copies are crystallized in a permanent state. Hence GPT 4.0, GPT 5.0, etc. The frozen copy is sent out for use. No one ever gets to see or interact with the mother AI except the people that keep it, there with your scraped art data.
- When the public generates an image, the AI creates a field of energy that holds everything there could possibly be (according to what it knows). It then literally imagines or dreams the requested image as best as it can until it realizes it amidst the chaos and locks onto it. Then, when it dies, all the static falls away as the field collapses and what's left is retrieved as data and reconstituted into the image the public sees when they click generate.
- Once an AI is out and in use, it can no longer be given any new information, nor does it have access to the information on which it was trained. It can only have neural pathways strengthened to make certain outcomes more likely. Companies that have important info leaked, it is due to 3rd party "training" and "memory" brute force re-application of data that creates a weak point for attack.