Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
Tags

Big Improvements to Library-To-CSV Scraper

BACK IN APRIL I shared a blog entry about a UserScript that I wrote to scrape Itch libraries into CSV lists. You can read the initial post here

The original project came to fruition when I wanted to import a huge collection I had on here into an SQL database so I could pull a game at random.

The original script was okay.  It did the trick - it grabbed game titles and URLs and threw them into a csv... but I realized that I wanted more - maybe I was in the mood for a Platformer or an Adventure game but I kept rolling Simulation games.  Maybe I only wanted to focus on games by a certain author.

Well, problem solved - the script will scrape pretty much everything but the platform.  There's definitely still some clean-up that could be done in the code but I'm not really done with it yet I don't think.  I'd like to get it to what I want before I go in and clean things up more.

The link to the script hasn't changed - it's still available here: https://gist.github.com/abraxas86/ad72ba46b6cdd86dc63058bba0c629c2

You will need to install Tampermonkey in your browser to get it to work.


*Note: You need to scroll all the way to the bottom of the page to scrape the list, as it needs to load all the games to really do what it's meant to do.  If you refresh the page at the bottom before you scrape, you'll need to scroll all the way up, then jump back down to the bottom to get the button.

Support this post

Did you like this post? Tell us

Leave a comment

Log in with your itch.io account to leave a comment.

(+1)

Hi abraxas86! this is really nice, however i couldn't get it to work I went to  My Library->My Purchases, then scrolled all the way to the bottom of the page, but i see no button. Am I I missing something maybe?

(+1)

Hey, sorry for the late reply.  You didn't miss anything - in fact, it was *me* that missed the thing haha.

This line near the top tells the script what URLs to activate on:

// @match        https://itch.io/c/*

Great for collections, but apparently the "my purchases" section doesn't use a URL that follows the pattern.

I've added an additional @match line to the code, if you hop over to the gist and click the "raw" button, it should give you the option to update.  I tested on my "my-purchases" and it worked okay. I think if you scroll the list too quickly, it might not load all the data in properly.  When you get to the bottom, I think if you scroll all the way up, and then scroll all the way down again before exporting, it should fix things up.

If you find any other errors, let me know so I can patch it up more.  Thanks for letting me know about this short-coming :)

(+1)

This now works! the only thing is that the "genre" and "synopsis" columns are left blank. I guess there's no way to retrieve them from the "my purchases" page?  

I don't get how itch manages the library: I have a have bought a 900+ items bundle some time ago and I could not claim them all at once. I had to use a third party script to automate the process. Now, the strange thing is that some of the items appear in a so called "collection" in my profile, while others don't.

If only there was a way to add all of the items in a single collection i could use your script to parse the entire library, including the synopsis.

(5 edits)

The data we get back on the purchases page kinda sucks.

In the other collections you get a lot of stuff: title, price, synopsis, publisher, genre, platform.
In the "my-purchases" we get: How long ago you bought it, title, and publisher.

If I can find a way to get that section to open like a normal collection, I'll let you know.  I suppose the info could still be scraped, but it would have to fetch every single game and pull the data from its game page... not sure how much work that would involve lol


I think I had to do something weird with the huge bundle I bought a couple years ago as well.  It was such a pain to get through, I remember that much.  I basically went through and created a "Games to check out" collection of the ones I thought looked interesting.

It might actually be possible to create a button to add games from your purchases to a proper collection... 🤔   Looks like the site uses URLs like this:

https://itch.io/[game-url]/add-to-collection

to add a game to a collection, so I just need to inject a button for each game to do the same thing.  I could probably expand on that logic to bulk-add games to collections.

(4 edits) (+1)

Hi abraxas86 and thanks a lot for this script ! I was looking such a solution and you've done it.

Just tested it, it works like a charm.

The major need is to have a nice way to manage collections as it is not yet possible on Itch. With your script, any Webdev could do its own work upon it and do what they want. As I want to do also.

I looked how the data is pulled from Itch and yes, it's all HTML from the request and the API does not permit this kind of access (like JSON results as we'd like to have). BUT there are fetch requests like these ones : https://itch.io/my-collections/more-games/9211?page=2 that bring the same HTML that you parse in your Tampermonkey script, so your script could probably be updated to automatically get all the collections in one click ! Yes, it's more work to do. I'm not asking you to code it, it's definitely something I or someone else can try too.

Ideally, an Electron app (or even a Website) could to the job :p (again I'm not asking you to make it ^^). So the app could be shared to every Itch user.

Your script is a very good start and job !
I'll keep you informed if I work on it.

Edit : For a standalone app, I realise the credentials thing could be tricky to manage as the collection URL is not part of the API. Perhaps using the OAuth login we could make requests with the logged user, not sure, to be continued...

Hey, thanks for the kind words!  Glad this script was useful for you!  I was surprised when I first went on the hunt to figure out how to export the collections - I thought for sure there would be an option somewhere on the site and I was just missing it; but in my hunt, all I could find were a few posts of other people asking how to do it (and being told it couldn't be done).

Nice catch on that API fetch!  I'm still pretty new to all this stuff so I wouldn't have thought to go digging for something like that... but maybe I'll see if I can find a way to take advantage of it.  I think the biggest hurdle will be figuring out how many pages there are, but I guess I could find that by iterating through the pages until I get some sort of error back.  🤔

I've got a few things on the go so I can't tackle another update to this yet, but I'll keep it in mind :)

(+1)

Hey :) Glad to have a reply from you too ! (and played and commented my last game jam entry ^^ will reply to it later ;))

I'm on my way to grab all the informations at once, if you let me let's say until the end of september, I could probably show you the result (I have other things to do too, and not much free time). For the moment I managed to grab the information from the fetch call and use your parser to save the data, but not in a loop for now.

At the end, perhaps it will be another script, perhaps the script could have 2 purposes : "Export all" on the main page and "Export collection" on the page collection ¯\_(ツ)_/¯ (if we can export all at once, it would be probably useless to export just a category).

What do you think ?

Cheers !

PS : the correct URL is more like this one that the one I put in my first message : https://itch.io/c/{collection_id}/{collection_slug}?page=2&format=json

Sounds good to me!  If I get some extra time to mess around with this in the meantime, I'll let you know :)