Post by quyse in Content-based search (academic research collaboration)

Viewing post in Content-based search (academic research collaboration)

A bit off-topic, but I just re-read itch.io Terms of Service, in particular:

Publishers retain all ownership rights to the submitted content, and by submitting content to the Service, Publishers hereby grant the following:
...
To Users, a non-exclusive, perpetual license to access the content and to use, reproduce, distribute, display and perform such content as permitted through the functionality of the Service. Users shall retain a license to this content even after the content is removed from the Service.

That's an interesting phrasing. It says that users are allowed to "distribute" games they download, i.e. essentially waiving copyright? Usually terms of service are quite the opposite, it looks like the words "you may not" are missing here :)

So I guess content scraping cannot be forbidden with such liberal terms of service, isn't it? Of course, there're still technical considerations, like do not overflow website with requests, etc. Would be nice to have official rules for scraping.

I once scraped itch.io games' public information for game ids < 200000 as an experiment. Took a few weeks or so (I used very low request rate). Wanted to calculate some interesting stats about itch.io games, but abandoned the project because of lack of time. Maybe I'll resume it someday. A few stats I got: number of published games in this range was 125775, and total size of uploads of those games was ~5.7 Tb (I didn't download them, only collected metadata). Now I can see that number of games is more than 200k, and max game id is in ~450k range. Quite a lot, but doable :)

Like Reply

No Time To Play7 years ago

We can't grant users a license to waive copyright. Of course not. What we can do is assure users that once they bought a game they'll retain access to it even if it's later withdrawn from sale, so they can continue to download and play what they paid for. Otherwise you end up in situations like the Microsoft e-book store.

As for content scraping, do we need to make rules about it now? We were hoping not to need rules for every little thing. Thank you for doing this responsibly last time.

Like Reply

quyse7 years ago

We can't grant users a license to waive copyright.

Yeah I know, that's why I said "essentially". Probably better to say "effectively". You cannot waive copyright, but you can require publishers to grant users a license with almost the same rights as copyright. Specifically, "license to distribute" sounds like it officially allows unlimited copying or pirating. I'm not sure I understand this right, I'm not a lawyer, I just saw it in terms and raised a question :)

As for content scraping, do we need to make rules about it now?

Ah, no, I don't really need rules. I guess being reasonable is enough. Although I would appreciate at least some informal guidance, i.e. what is recommended request rate for itch.io, or whether it's OK to download the games themselves. Downloading and analyzing terabytes of data is not that costly nowadays, especially considering that incoming traffic on clouds is usually free, but obviously some cost will be on itch.io for its outgoing traffic and CDN charges.

To be clear, I'm not working on this project right now, and it's not that serious. What's implemented (quite a while ago) is this: https://itchy.quyse.io/ - simple analyzer of publicly available itch.io projects. Example analysis (game I'm working on): https://itchy.quyse.io/game/19957. The idea is to expand analysis to all itch.io projects (rather than do it on-demand), and gather statistics about games binaries/resources, like what percentage of all games uses Unity, or what are the minimal required Linux distributions for games with Linux version, etc. I hope I'll get to work on it as a weekend project at some point.

Like Reply