- Your stuff got scraped.
So you know that AI is based on stolen art, and you still use it.
Perhaps my point of view is different. I grew up in a time when it was common knowledge that there are no security guarantees for anything you put up on the internet--which is quite literally the most public forum in all of recorded history. Scraping cannot bypass things like password protected pages. So that means anything that was scraped was truly, literally open to the public in some way. That is, anyone could type in the link and look at it. Anyone can right-click and hit "save image" or even screenshot then copy/paste. I know this. You know this. We all know this. It's known whenever we put something up on the world wide web. In truth, this is like finding out there really was a boogie man all along. We were warned all our lives.
Also, with or without that scraped art, AI would still be able to produce that exact art. Because thinking AI is referring back to any data set after it has been trained means you haven't yet gone far enough in your reasoning. This isn't pattern recognition, no matter who tries to tell you it is. AI is actually 3 things in one--and one of those things is a neural net. It's literally a simulated human brain, copied from actual human brains (read how we got neural net tech, no joke). So if we understand brains as well as we think we do, then this thing literally learns and thinks and reasons just like you and I do. So before I can take the stealing argument seriously and whole-heartedly, I have to ask:
What is stealing versus inspiration? You know what I mean?
Scraping cannot bypass things like password protected pages. So that means anything that was scraped was truly, literally open to the public in some way.
Untrue. This statement is years out date.
It’s hard to believe you haven’t seen by now any of the reports of private information being scraped during data breaches and unethical policy changes to be used in “machine learning” paid for the big AI companies. The mass scraping of everything, including sensitive documents on personal devices or in “secure” storage, is one of the major sources of anger and frustration about this subject.
It’s important to remember, too, why “everything online is public” is a defeatist response to hugely controversial actions (repeatedly judged illegal in courts) by Google, Facebook, and oppressive government agencies.
I’m old enough to remember when web crawlers were only supposed to visit without permission. But then Google took over web searching and made people afraid not to allow its indexing. Then the copy argued it had to be allowed to copy and store web page contents, including any and all images, to return search results. Then, oops, they were using copyrighted images the whole time for massively secretive machine learning projects with almost no ethical oversight, all while heavily lobbying politicians for control over relevant laws and enclosing millions to billions of people into spying tech.
There was almost no honest consent of use in these processes.
Also, with or without that scraped art, AI would still be able to produce that exact art.
Nowhere close to true. Much of the generative models out there are elaborate plagiarism machines.
Look at the official complaints for why stopping scraping would destroy the AI/ML/chatbot industry.
Or what happened when scraped images are “poisoned” with invisible noise. The anti-AI tools wouldn’t be nearly as effective as they are if the generative models could reproduce the images on their own.