A bit off-topic, but I just re-read itch.io Terms of Service, in particular:
Publishers retain all ownership rights to the submitted content, and by submitting content to the Service, Publishers hereby grant the following:
To Users, a non-exclusive, perpetual license to access the content and to use, reproduce, distribute, display and perform such content as permitted through the functionality of the Service. Users shall retain a license to this content even after the content is removed from the Service.
That's an interesting phrasing. It says that users are allowed to "distribute" games they download, i.e. essentially waiving copyright? Usually terms of service are quite the opposite, it looks like the words "you may not" are missing here :)
So I guess content scraping cannot be forbidden with such liberal terms of service, isn't it? Of course, there're still technical considerations, like do not overflow website with requests, etc. Would be nice to have official rules for scraping.
I once scraped itch.io games' public information for game ids < 200000 as an experiment. Took a few weeks or so (I used very low request rate). Wanted to calculate some interesting stats about itch.io games, but abandoned the project because of lack of time. Maybe I'll resume it someday. A few stats I got: number of published games in this range was 125775, and total size of uploads of those games was ~5.7 Tb (I didn't download them, only collected metadata). Now I can see that number of games is more than 200k, and max game id is in ~450k range. Quite a lot, but doable :)