Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

itch.io API for Game Metadata

A topic by YUART created 48 days ago Views: 511 Replies: 14
Viewing posts 1 to 6

Hi everyone,

I’m currently working on a personal project involving a game recommendation system, and I’m hoping to use data from itch.io to train and test my system. I’m running into a few questions about accessing the necessary game metadata and would appreciate any insights you can offer.

Specifically, I’m wondering:

  • Direct Database API - beyond the existing JSON API, is there a direct API or access method to the itch.io database itself, that stores game metadata (like tags, descriptions, author, release date, etc.)?

  • Game ID List - is there a way to retrieve a comprehensive list of all game IDs on itch.io via an API? I need these IDs to then collect the metadata for each game. Ideally, I’d also like to be able to filter this list by a specific date range (e.g., games released between March 1, 2024, and April 31, 2024).

  • API Documentation - the main API endpoint seems to be located at https://api.itch.io/. Is there any official documentation available for this API? I’m finding it difficult to understand the full range of available functionalities.

  • API Usage and Scraping Restrictions - are there any specific terms of use or restrictions related to using the itch.io API for data collection or scraping? I want to ensure I’m complying with all the necessary guidelines while gathering data for my project.

Any information, links to documentation, or advice regarding these points would be greatly appreciated. Thank you for your time and assistance!

Moderator(+1)

You mean outside the itch.io API documentation?

Yes. It seems not to contain all the information I need.

Deleted 48 days ago
(1 edit) (+1)

The api is meant for developers. Not for users or data scrapers.

The newest project ID is 3460XXX. Since there are "only" 1 million games, trying to collect data for 3.5 million projects is not adviseable, even if it were possible. I do not know if it is. Itch suffers from enough ddos attemps. Any attempt of your's to scrape data is bound to be blocked anyways by protection systems. So you will collect a whole lot of nothing.

Use smaller sample size. The data on Itch is unreliable, especially the tags. They have no corrective factor. They are chosen by the developers and there is no systematic input from the users.

https://itch.io/games/top-rated/year-2014

That is 2000 games.

https://itch.io/games/top-sellers/year-2014

That is 1000 games.

https://itch.io/games/top-sellers/store/year-2014

That is 330 games. You can do 330 games by hand and the data set will have little to no change. And it already is a bit normalized by only counting paid games that actually sold enough to be in the "top-sellers" list.

Or filter down by a specific tag first. There are a lot of tags/combination of tags where you can have an observeable amount of games under 1k entries. https://itch.io/games/top-rated/store/tag-fantasy this is 900 games. While this is 28000 https://itch.io/games/tag-fantasy

Also, there is no user data to correlate. Itch users are notorios for not following, not commenting, not rating, not even having a user account.

Currently the most popularity influx is coming from youtube. You see it by the comments with videos on the popular page. So "popularity" is a poor indicator to say how a recommendation would work. You can only say what is popular. Not if one person that likes this would also like that.

If I like X, would I like Y? To predict that, you would have to have user data. Or maybe I misunderstand what you are trying to do or what insights you hope to get from the data you want.

Hi, thanks a lot for your input. You are correct that this will be nearly impossible to get data about all all games with direct access to DB with no throttling and etc.

I like your idea about accessing only subsets of data, especially when this data is “filtered” by users themselves through the process of reviewing and buying games.

And it already is a bit normalized by only counting paid games that actually sold enough to be in the “top-sellers” list.

Yes, exactly, great idea. I would like my system to find “hidden gems” but right now I have too small amount of time/money to implement such a system properly.

Right now, I’m building a prototype of the system that can find similarity between the game genres. So, for example, we have game X with a game genre “FPS” and Y with a game genre “Third Person Shooter”. Very simple systems will say that those games have nothing in common. I try to build a system that would say “X is 0.7 similar to Y” (based on game genre only) because FPS and TPS have a lot in common, and people who enjoy FPS may enjoy TPS too.

That’s why I need tags. I understand that the data may be incorrect, but for now, I really need a small subset of any sane data to be able to do my calculations.

Anyway, thanks for your help and fresh ideas. Hope to return with some results soon.

(+1)

“hidden gems”

That is highly subjective. One user's gem is another user's trash.

If I understand correctly you want an algorithm to detect underrated games. Not in the factual sense that they have a low number of ratings. But in the sense that they should be more famous.

I do not think you can do that with tags. And not really on Itch. One of the best games I know on Itch in a special sub sub category. It does not even have a tag. (I shall not name it here, but let's say, I knew that game before I knew Itch, so to me, the game is more famous than Itch itself.)

To give a real world example. https://tomorrowcorporation.itch.io/human-resource-machine

The game has 4 ratings currently. It is from 2018. Has a single comment ... from 2025.

The plot twist is, the game has 3000 reviews on Steam...

That is highly subjective. One user’s gem is another user’s trash.

Yes, I agree with that point, and that’s what my system tries to do (or at least it will try to do that when it is implemented 😅).

If I understand correctly you want an algorithm to detect underrated games.

Now, I don’t want to create the system that does that. My idea is to create a system that suits every user’s tastes. The original post was about the small part of that system, so I didn’t bother to describe the whole idea because this is not the topic of a discussion. If you are interested in more context, see my reply to @user22 below.

(+1)

Whats your pov about game tech, its age and status?  Your example of TPS and FPS being close, is legit, but what happens when the first game is 3D f2p android game and the other is 2D browser furry nsfw game? There are lots of variations to even start comparation, imo..

As the opinion above, itchio gives untied hands for everyone without any harsh rules to follow, combined with its huge content, it's a big pile of data to analyze.

The hidden gems you mention, what would you do with them? How would they serve a purpose in your project? Let's say you found 10 games (hidden gems) which are 99% close to the first game, how would that help your goal? Another example, if all those games are $20 each, i doubt people will happily buy games around.

Whats your pov about game tech, its age and status?

Sorry, I don’t understand this question. Do you mean game recommendations systems or gamedev as a whole?

but what happens when the first game is 3D f2p android game and the other is 2D browser furry nsfw game?

I believe this is possible to build a recommendation system that can handle such complexity. It’s hard, but it’s doable. I think the main tactic here is divide-and-conquer - you split the game’s metadata into different, clearly defined characteristics and work with that. From your example, you can describe a few separate parameters:

  • dimension (3D or 2D)
  • platform (Android native game or PC browser)
  • price (F2P or P2P)
  • age rating (NSFW or SFW)
  • style or whatever you call that parameter (furry or whatever)

After that, you can find a way to correctly classify every game in those categories so you can deterministically compare those games and find their similarity weight.

As the opinion above, itchio gives untied hands for everyone without any harsh rules to follow, combined with its huge content, it’s a big pile of data to analyze.

It will be hard to correctly classify every single tag, but I think it’s possible to classify the most valuable.

The hidden gems you mention, what would you do with them? How would they serve a purpose in your project? Let’s say you found 10 games (hidden gems) which are 99% close to the first game, how would that help your goal? Another example, if all those games are $20 each, i doubt people will happily buy games around.

My current idea is to create a non-biased recommendations system that is based on users’ experience with other games, not on popularity. This is also a good selling point because, from what I know, big recommendation engines like Steam have based their recommendations on popularity and traffic, not on personal experiences. The idea I have is to find not a “good” game but a game that the particular user will like. Some people may enjoy games that are marked as “trash” by the majority and vice versa (some people mark popular games as “trash”).

(+1)

the game games are made in different technology, and game from 10 years ago (with exact same tags and content) is not the same with game released this year. As for the status, some games are work in progress, some are completed and some may be demo. Such diversity may add another set of complexity layer.

The current tease with popularity and sales is working because it's causing viral effect and marketing serves it's purpose. 

The idea you have is interesting, but with the potential walls and complexity, it may not worth enough or be hard to execute and monetize. I bet you heard about youtube/netflix/amazon recommendation system, that it is bad. On the other hand, however, tiktok rocks but it's algos is based on millions on data and it's content can be consumed within seconds/minutes and it's unlike games.

Regardless, good luck and keep us posted!

(+1)

Correct, creating a good recommendation system for anything is a very hard task. I’m for sure can’t do that alone, especially because I’m not a data science guy, I have a bit of expertise in gamedev only. What I hope is to be able to create a prototype and see if anyone is interested. If not - I will just move to another idea.

What I hope is to be able to create a prototype and see if anyone is interested.

To state the obvious.

https://itch.io/library/recommendations

Here are some things you might like based on your rating, purchase, and download history.

Itch has such a recommendation system and Itch has more data available than you.

Steam has a lot more data. It knows which games you actually play. For a proof of concept Itch is a terrible place to get test data and correlate the fingerprint of a game to how people like it or not. You just cannot use tags in a reliable way here. They do not accurately describe the games. Doing maths on those is acting on false premises. You will not find the "hidden" gems, you will find the "obvious" gems - which people already found themselves or get in their Itch recommendations.

If you have test users, try using their public Steam profile, if they have such. The games on Steam do have mostly accurate tags, because they are user chosen and even ranked by relevance. And they are from a fixed pool.

And then compare your predictions to the Steam recommendations those test users get.

I’m not a data science guy

Steam has those, and I hope Itch asked someone that knows this stuff. The recommendation systems in place on those platforms are made by expert professionals.

There are also independent recommendation systems on the net. This is a thing where research went into. Basically every store has a vested interest in analyzing such data and using it for advertisement recommendations. https://duckduckgo.com/?q=game+recommendation+engine&ia=web

Sorry, but I see we are not on the same page there, and I also have no time to go deep into how my idea can provide a value and/or compete with existing platforms. I’m currently planning my things to also be able to answer questions like yours (“Why should I use your system if there are already working systems from big companies?”).

Anyway, thank you for your previous hints and ideas. I will ping you when I am ready to present some “real” things about my recommendation system, and I will be happy to hear your feedback.

(+1)

Don't bother answering people that the only thing they want is to demotivate you, keep doing your project, you'll find a way to do it ;)

(+1)

Thanks :)