Skip to main content

On Sale: GamesAssetsToolsTabletopComics
Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines
(1 edit)

Scraping cannot bypass things like password protected pages. So that means anything that was scraped was truly, literally open to the public in some way.

Untrue. This statement is years out date.

It’s hard to believe you haven’t seen by now any of the reports of private information being scraped during data breaches and unethical policy changes to be used in “machine learning” paid for the big AI companies. The mass scraping of everything, including sensitive documents on personal devices or in “secure” storage, is one of the major sources of anger and frustration about this subject.

It’s important to remember, too, why “everything online is public” is a defeatist response to hugely controversial actions (repeatedly judged illegal in courts) by Google, Facebook, and oppressive government agencies.

I’m old enough to remember when web crawlers were only supposed to visit without permission. But then Google took over web searching and made people afraid not to allow its indexing. Then the copy argued it had to be allowed to copy and store web page contents, including any and all images, to return search results. Then, oops, they were using copyrighted images the whole time for massively secretive machine learning projects with almost no ethical oversight, all while heavily lobbying politicians for control over relevant laws and enclosing millions to billions of people into spying tech.

There was almost no honest consent of use in these processes.

Also, with or without that scraped art, AI would still be able to produce that exact art.

Nowhere close to true. Much of the generative models out there are elaborate plagiarism machines.

Look at the official complaints for why stopping scraping would destroy the AI/ML/chatbot industry.

Or what happened when scraped images are “poisoned” with invisible noise. The anti-AI tools wouldn’t be nearly as effective as they are if the generative models could reproduce the images on their own.