Skip to main content

On Sale: GamesAssetsToolsTabletopComics
Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

Where did the pixel art in the generator’s dataset come from?

I’ve seen AI images here on Itch.io that appear to have been based on the actual art made by people I know from other sites. I’ve been wondering if that’s somehow a coincidence.

I’ve also seen pixel artists give up on sharing any art online because the constant scraping of all our work wore them out. Other pixel artists who used to lead drawing events pulled back behind paywalls to reduce abuse. With what I’ve witnessed in other art forms, it’s hard to believe the pixel “art” programs were built on as much stolen art as the other types were.

But it would be a tiny amount or comfort (with how the AI industry is burning up our planet, polluting essential resources for quick text-to-image-to-text and chatbot production, and spreading harmful propaganda developed by artist-hating fascists) if these responses from other pixel artists were from paranoia, not actual abuse.

(I know my writing, photos, and graphic designs were scraped without my consent. I’m not sure my pixel art has.)

"I’ve been wondering if that’s somehow a coincidence."

See, the fun thing about AI is that nobody can ever say for sure if it is a coincidence or not. Therefor, it is completely morally acceptable!!1! 

"Where did the pixel art in the generator’s dataset come from?"
That's not how they work. The training data is off somewhere in a lab. By the time you are generating stuff, the AI has a fully formed neural network.

Think of it like this (metaphorically but also literal, functionally speaking):

- Your stuff got scraped.

- It was ripped apart and "chunked" into a special format that the AI brain can comprehend

- It was fed to the neural network, which magically understands it (not BSing you, this was not programmed in, kind of a big deal no one talks about)

- Then it was able to be "trained" through positive and negative reinforcement feedback because it somehow magically understands the concept of positive and negative want/desire/whatever you wanna call it.

- When a picture of your scraped art looks like a picture of your scraped art, it's ready to go.

- That AI is copied, and it's copies are crystallized in a permanent state. Hence GPT 4.0, GPT 5.0, etc. The frozen copy is sent out for use. No one ever gets to see or interact with the mother AI except the people that keep it, there with your scraped art data.

- When the public generates an image, the AI creates a field of energy that holds everything there could possibly be (according to what it knows). It then literally imagines or dreams the requested image as best as it can until it realizes it amidst the chaos and locks onto it. Then, when it dies, all the static falls away as the field collapses and what's left is retrieved as data and reconstituted into the image the public sees when they click generate.

- Once an AI is out and in use, it can no longer be given any new information, nor does it have access to the information on which it was trained. It can only have neural pathways strengthened to make certain outcomes more likely. Companies that have important info leaked, it is due to 3rd party "training" and "memory" brute force re-application of data that creates a weak point for attack.

(+1)
  • Your stuff got scraped.

So you know that AI is based on stolen art, and you still use it.

Perhaps my point of view is different. I grew up in a time when it was common knowledge that there are no security guarantees for anything you put up on the internet--which is quite literally the most public forum in all of recorded history. Scraping cannot bypass things like password protected pages. So that means anything that was scraped was truly, literally open to the public in some way. That is, anyone could type in the link and look at it. Anyone can right-click and hit "save image" or even screenshot then copy/paste. I know this. You know this. We all know this. It's known whenever we put something up on the world wide web. In truth, this is like finding out there really was a boogie man all along. We were warned all our lives.

Also, with or without that scraped art, AI would still be able to produce that exact art. Because thinking AI is referring back to any data set after it has been trained means you haven't yet gone far enough in your reasoning. This isn't pattern recognition, no matter who tries to tell you it is. AI is actually 3 things in one--and one of those things is a neural net. It's literally a simulated human brain, copied from actual human brains (read how we got neural net tech, no joke). So if we understand brains as well as we think we do, then this thing literally learns and thinks and reasons just like you and I do. So before I can take the stealing argument seriously and whole-heartedly, I have to ask:

What is stealing versus inspiration? You know what I mean?

(1 edit)

Scraping cannot bypass things like password protected pages. So that means anything that was scraped was truly, literally open to the public in some way.

Untrue. This statement is years out date.

It’s hard to believe you haven’t seen by now any of the reports of private information being scraped during data breaches and unethical policy changes to be used in “machine learning” paid for the big AI companies. The mass scraping of everything, including sensitive documents on personal devices or in “secure” storage, is one of the major sources of anger and frustration about this subject.

It’s important to remember, too, why “everything online is public” is a defeatist response to hugely controversial actions (repeatedly judged illegal in courts) by Google, Facebook, and oppressive government agencies.

I’m old enough to remember when web crawlers were only supposed to visit without permission. But then Google took over web searching and made people afraid not to allow its indexing. Then the copy argued it had to be allowed to copy and store web page contents, including any and all images, to return search results. Then, oops, they were using copyrighted images the whole time for massively secretive machine learning projects with almost no ethical oversight, all while heavily lobbying politicians for control over relevant laws and enclosing millions to billions of people into spying tech.

There was almost no honest consent of use in these processes.

Also, with or without that scraped art, AI would still be able to produce that exact art.

Nowhere close to true. Much of the generative models out there are elaborate plagiarism machines.

Look at the official complaints for why stopping scraping would destroy the AI/ML/chatbot industry.

Or what happened when scraped images are “poisoned” with invisible noise. The anti-AI tools wouldn’t be nearly as effective as they are if the generative models could reproduce the images on their own.

Deleted 18 days ago