Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines
(1 edit) (+1)

The api is meant for developers. Not for users or data scrapers.

The newest project ID is 3460XXX. Since there are "only" 1 million games, trying to collect data for 3.5 million projects is not adviseable, even if it were possible. I do not know if it is. Itch suffers from enough ddos attemps. Any attempt of your's to scrape data is bound to be blocked anyways by protection systems. So you will collect a whole lot of nothing.

Use smaller sample size. The data on Itch is unreliable, especially the tags. They have no corrective factor. They are chosen by the developers and there is no systematic input from the users.

https://itch.io/games/top-rated/year-2014

That is 2000 games.

https://itch.io/games/top-sellers/year-2014

That is 1000 games.

https://itch.io/games/top-sellers/store/year-2014

That is 330 games. You can do 330 games by hand and the data set will have little to no change. And it already is a bit normalized by only counting paid games that actually sold enough to be in the "top-sellers" list.

Or filter down by a specific tag first. There are a lot of tags/combination of tags where you can have an observeable amount of games under 1k entries. https://itch.io/games/top-rated/store/tag-fantasy this is 900 games. While this is 28000 https://itch.io/games/tag-fantasy

Also, there is no user data to correlate. Itch users are notorios for not following, not commenting, not rating, not even having a user account.

Currently the most popularity influx is coming from youtube. You see it by the comments with videos on the popular page. So "popularity" is a poor indicator to say how a recommendation would work. You can only say what is popular. Not if one person that likes this would also like that.

If I like X, would I like Y? To predict that, you would have to have user data. Or maybe I misunderstand what you are trying to do or what insights you hope to get from the data you want.

Hi, thanks a lot for your input. You are correct that this will be nearly impossible to get data about all all games with direct access to DB with no throttling and etc.

I like your idea about accessing only subsets of data, especially when this data is “filtered” by users themselves through the process of reviewing and buying games.

And it already is a bit normalized by only counting paid games that actually sold enough to be in the “top-sellers” list.

Yes, exactly, great idea. I would like my system to find “hidden gems” but right now I have too small amount of time/money to implement such a system properly.

Right now, I’m building a prototype of the system that can find similarity between the game genres. So, for example, we have game X with a game genre “FPS” and Y with a game genre “Third Person Shooter”. Very simple systems will say that those games have nothing in common. I try to build a system that would say “X is 0.7 similar to Y” (based on game genre only) because FPS and TPS have a lot in common, and people who enjoy FPS may enjoy TPS too.

That’s why I need tags. I understand that the data may be incorrect, but for now, I really need a small subset of any sane data to be able to do my calculations.

Anyway, thanks for your help and fresh ideas. Hope to return with some results soon.

(+1)

“hidden gems”

That is highly subjective. One user's gem is another user's trash.

If I understand correctly you want an algorithm to detect underrated games. Not in the factual sense that they have a low number of ratings. But in the sense that they should be more famous.

I do not think you can do that with tags. And not really on Itch. One of the best games I know on Itch in a special sub sub category. It does not even have a tag. (I shall not name it here, but let's say, I knew that game before I knew Itch, so to me, the game is more famous than Itch itself.)

To give a real world example. https://tomorrowcorporation.itch.io/human-resource-machine

The game has 4 ratings currently. It is from 2018. Has a single comment ... from 2025.

The plot twist is, the game has 3000 reviews on Steam...

That is highly subjective. One user’s gem is another user’s trash.

Yes, I agree with that point, and that’s what my system tries to do (or at least it will try to do that when it is implemented 😅).

If I understand correctly you want an algorithm to detect underrated games.

Now, I don’t want to create the system that does that. My idea is to create a system that suits every user’s tastes. The original post was about the small part of that system, so I didn’t bother to describe the whole idea because this is not the topic of a discussion. If you are interested in more context, see my reply to @user22 below.