Hi, thanks a lot for your input. You are correct that this will be nearly impossible to get data about all all games with direct access to DB with no throttling and etc.
I like your idea about accessing only subsets of data, especially when this data is “filtered” by users themselves through the process of reviewing and buying games.
And it already is a bit normalized by only counting paid games that actually sold enough to be in the “top-sellers” list.
Yes, exactly, great idea. I would like my system to find “hidden gems” but right now I have too small amount of time/money to implement such a system properly.
Right now, I’m building a prototype of the system that can find similarity between the game genres. So, for example, we have game X with a game genre “FPS” and Y with a game genre “Third Person Shooter”. Very simple systems will say that those games have nothing in common. I try to build a system that would say “X is 0.7 similar to Y” (based on game genre only) because FPS and TPS have a lot in common, and people who enjoy FPS may enjoy TPS too.
That’s why I need tags. I understand that the data may be incorrect, but for now, I really need a small subset of any sane data to be able to do my calculations.
Anyway, thanks for your help and fresh ideas. Hope to return with some results soon.