GMTK Game Jam 2019

Hosted by Game Maker's Toolkit · #gmtkjam

2,553

Entries

46.6k

Ratings

Overview Submissions Results

Community509

Screenshots Submission feed

GMTK Game Jam 2019 community

JSON export of jam results + results discussion Sticky

A topic by leafo created Aug 12, 2019 Views: 3,188 Replies: 42

Viewing posts 1 to 18

leafoAdmin4 years ago (4 edits) (+7)

Hey all,

I've been closely watching the feedback people have been giving about the results of the jam. There's a lot of discussion going on!

I thought it might be interesting to create a JSON export of all the results so people can experiment with coming up with their own ranking formulas (or do any other analysis they want). I can't say if the outcome of this topic will impact the results of the jam. I'm not the host of the jam, just an itch.io admin. But, I think this could be cool way to explore how itch.io may handle ranking results in jams for the future.

Before we get to the results file though, I want to explain how the current ranking system works.

Current ranking system

The score formula:

final_score = average_score * sqrt(min(1, num_ratings / median_num_ratings))

average_score is the average score of all the ratings you got
num_ratings is the number of ratings you got
median_num_ratings is the median ratings per submission across all submissions in the jam

In plain English: For submissions that have number of ratings ≥ median, the average score (aka raw score) becomes the final score. Submissions that have less than the median number of ratings will have their score penalized proportional to how few ratings they got. The sqrt is used to make the penalty falloff a bit more gradual.

Approximately half of all games will have a penalized score (due to the use of the median). The idea is to balance between not giving popular games with a lot of ratings an advantage, and letting lesser known submissions have a chance at having a high score. The median puts that balance right in the middle.

A submission's rank is then calculated by sorting all the submissions by their final_score

Results dump

(It's a large JSON file)

https://itch.io/jam/10205/results.json

Looking forward to seeing what people come up with

Note about ratings

This jam uses Public Voting. This means that anyone with an itch.io account can rate a submission. Although this gets a lot of people to participate, I personally this think method is inherently flawed. It can encourage cheating, and it often ends up detecting who has the largest network of friends available to vote on their project. (The alternate rating method has only people who have submitted to the jam have a vote). Just something to consider.

Thanks

Bad PiggySubmitted4 years ago

I suppose this makes sense.....but of course, it can be abused like anything

This works for smaller jams, but not for something of this size.

Also, if "design" is the only criteria on which the final list is going to be based on.......why have other criterias in the first place ? 😂

leafoAdmin4 years ago (1 edit) (+1)

When Mark created the jam he chose Design as the primary criteria, so final results are based on that one. I'm not sure if that was intentional or not. If he wants, it can be changed to average all the criteria, but we'll have to wait until he wakes up to find out!

Bad PiggySubmitted4 years ago

Well I suppose so..... anyway, it was a good learning experience.

I'm just happy my rank is under 2000 😂😂

SimpleTeamSubmitted4 years ago(+1)

I wonder if It is possible to disable the Public rating ?

Derpyhero214 years ago (1 edit)

hi i just wanted to let you know that i cant run your game Mario 2Dyssey because i don't have any software to run it because it looks like a very good game but once again i cant run it so if you could maybe let me know what software i need to run it would be greatly appreciated

thanks!

also sorry if none of that made sense :P

SimpleTeamSubmitted4 years ago

Thanks !

ToasterHeadSubmitted4 years ago (1 edit)

Hey, quick question! If I change the game file (and lets suppose Mark sees my game), will Mark see my original one or the most current one? Because some people might change the file of their game and fix all the bugs and basically update their game.

leafoAdmin4 years ago

Pages aren't locked anymore so it's possible for people to change their files now. If Mark is supposed to be playing the original submission then you should not change your file. There's a date associated with when the file is added.

mindcopgamesSubmitted4 years ago

How can one see when a file was updated? It seems people are starting to update their files.

Ignacio GarciaSubmitted4 years ago (1 edit) (+4)

First of all, thanks for being open and looking for ways to improve the system.

Second, this are my suggestions, I’ve been discussing these quite a lot in the GMTK Discord.

The host should pick a % of number of ratings, the games below that % should get a proportionally lower score based on that % but not affect those above it (Basically the system as it is now but without boosting people whose games are popular so that everyone is on even footing instead of being a popularity contest and giving the host the power to set that % because the median is not enough for larger Jams.)

There should be a way to browse games the top rated games without ranking them, so they cant be targeted easily Like showing the 10% most rated games but sorted in random order. (This makes good games float up without putting a target on people who are really popular.)

Don’t show others the ratings and comments your game has. (This makes bombing someone that has lots of ratings harder and it gets rid of the popularity bias that makes people be more critical towards those games.)

Only people who submitted a game can rate and only after leaving a comment. (This puts the focus on giving good feedback and fair ratings, getting rid of friends and family voters that boost games.)

Those are my thoughts, I'll add more if I can come up with other suggestions.

DomenPigeonSubmitted4 years ago

I agree also, I was thinking of something similar myself. 👍👍

JWoodrellSubmitted4 years ago(+3)

for those who have trouble parsing json, here is the data in a spreadsheet to copy into your Excel of choice.
spreadsheet

rocketfallsSubmitted4 years ago (1 edit) (+1)

It definitely shouldn't have been a public vote is the one thing I think most of us could agree on. For a smaller jam it probably helps, but for such a large thing it definitely only dilutes the score.

I appreciate the intent of the balancing against median thing but I feel like it's just as abusable as everything else, especially in combination with the public rating. There's probably not an easy solution to it all around

LRFLEWSubmitted4 years ago (2 edits)

I'm assuming you made a transcription error with the equation.

max(1, num_ratings / median_num_ratings)

This would result in 1 if num_ratings <= median_num_ratings, and >1 if num_ratings > median_num_ratings. I'm assuming you meant "min" here, as that would match the description of the function.

Also side note: I feel like the median was possibly influenced by "invalid" submissions. I encountered a number of submissions that were not playable (at least on Windows or Browser, as the rules require), meaning they inherently got less ratings. I wonder how much this would have influenced the resulting median.

leafoAdmin4 years ago (1 edit)

Good catch, thanks, it was a mistake made when typing up the post. I've updated the original post.

leafoAdmin4 years ago

Also side note: I feel like the median was possibly influenced by "invalid" submissions. I encountered a number of submissions that were not playable (at least on Windows or Browser, as the rules require), meaning they inherently got less ratings. I wonder how much this would have influenced the resulting median.

For things that were truly broken, I disqualified those (about 37 submissions). They will not affect the median. Although it's difficult get every broken submission, the (relatively) small amount of disqualifications suggests that the median probably wanted influenced too much.

CerosWareSubmitted4 years ago (4 edits)

Number of ratings should be factored in, not excluded, somehow. It actually reflects the community's view of a game. Right now it feels like games with high number of ratings are the ones being penalized, because they're more "exposed" to bad ratings. Many games with less than 20 ratings are topping the Design list.

CerosWareSubmitted4 years ago(+1)

For example: Gooey Castle ranked 1st with 73 ratings. Totally awesome. 2nd place? 11 ratings. Doesn't feel with the same weight at all.

Deleted post4 years ago

Deleted 2 years ago

leafoAdmin4 years ago (1 edit) (+2)

My definition of cheating is when someone creates fake accounts to create fraudulent votes on a submission.

I manually went through all of their ratings, and I don't see any evidence of that. It's very likely that they have access to a lot of distinct people (their friends) who were willing to go and create itch.io accounts and vote on their entry.

That's not cheating, but a disadvantage of public voting system. Many other submissions did the same exact thing, they just were the ones who happened to have the most friends.

CerosWareSubmitted4 years ago (1 edit) (+1)

I'm not accusing Gooey Castle of being fraudulent. That's another topic. All I'm saying is that there are games on the top 100 overall that may be great games, but weren't exposed enough to the community. On the other hand, we've got games with both high ranking AND high number of ratings; those are totally fine in my book (like TowerBag). Should every highly rated game be ranked higher? Not exactly. A popular game may be horrendous (#5 most rated for example) and should tank accordingly.
I'm not saying I've got the answer for this issue. I'm only pointing out that highly rated games are exposed to a wider perspective from the community, and that should have some value.

12 games from the top 100 most rated games made it the top 100 overall list.

CerosWareSubmitted4 years ago (1 edit)

A more positive example: Negative Nancy, #17 overall which is excellent. However, with 81 ratings, it should totally be among the top 3.

minimaulSubmitted4 years ago(+4)

I guess the "impact" of having very few ratings should indeed be stronger. There are games in the top 100 that have only 8 or 9 ratings (all done by a bunch of friends, as comments are not written in english)

If the best stategy to be on top of the list is to show the game only to a few people (to ensure 5 stars), that goes against the idea of a game jam to me.

CerosWareSubmitted4 years ago(+1)

If the best stategy to be on top of the list is to show the game only to a few people (to ensure 5 stars), that goes against the idea of a game jam to me.

Bingo!

kaosklownSubmitted4 years ago(+1)

Some people know how to promote their game in the community forums and get dozens of ratings and comments, which has nothing to do with game design... As long as a game has more ratings than the median, it feels right to me to consider their score valid.

To me, voting shouldn't be public.

There could even be an option for a dev to be presented a random game to rate, and that rating could have more weight than non random ones, that way you ensure fair rating.

HarrySubmitted4 years ago

Yeah, for this jam we had a variant of the Randomiser to show us a random jam entry (which worked very nicely), so it could be interesting to see an option for only the Randomiser to be available when picking games (plus the top 100 least rated games list). Not sure how feasible, but interesting nonetheless.

HarrySubmitted4 years ago

Ok I have a bit of a thought about Public Rating. Perhaps an extra thing you could do would be to allow accounts to submit ratings, but only actually count/submit them when the user has also rated 2 other games. That way, more games get votes, the voter gets a wider sample pool by which to base their ratings, and it makes it so users can't just instantly make an account and rate a single game. It also gets them more involved in the Itch ecosystem which is nice!

Deleted postSubmitted4 years ago

Deleted 4 years ago

HarrySubmitted4 years ago(+1)

Yeah - public voting in large jams will be an issue as people can naturally get competitive or worried about not being validated by reaching a high enough rating - I think either way being able to check whether a person is just downvoting other submissions (perhaps if many times they've voted far below the mean of results) would be an interesting safeguard, or making it so only Itch accounts of a certain age can vote (e.g. were created before the jam started, or before voting started) but ultimately any form of public voting whatsoever on the internet is going to have to rely on trusting everyone involved to an extent.

Carter GamesSubmitted4 years ago

I agree that voting should be limited to people in the jam, it makes it fell more like a community effort. I've seen a lot about the ratings causing people with low amounts of ratings to do well while those with higher ratings do worse. While I can see this been a bad thing, I wouldn't try to remove it entirely as otherwise it just falls to who has the most attractive looking game with the best art that has gotten the most attention because of it. While having nice art is always a plus, this jam inparticular as focused on design and at the end of the day. So realistically its just tweaking the formula to work better. I'd also add maybe community engagement (on itch.io) should go into account. maybe number of comments over 100 words. over have a category for best sport based on that factor, just encorages people to review and rate games.

That's my 2 cents on the matter as of writing...

xdanSubmitted4 years ago

So if not enough people play my game I get penalized? What am I supposed to do? Invest on a marketing campaign?

Team TerribleSubmitted4 years ago

There will be a functional minimum, but essentially yes. Though I think this should be a function of your participation - the more people you vote for the more votes you should get - rather than who has the biggest twitter following...

leafoAdmin4 years ago

This time around, the amount of ratings needed for penalty is pretty low (10). It wasn't hard to get there by leaving constructive comments on other people's projects and letting them discover your game that way.

Many people are pushing for a higher minimum rating level for a full score though, which would indeed making it harder for games that don't take off in some way. We'll have to wait until next year to see how things are handled.

Deleted post4 years ago

Deleted 4 years ago

Team TerribleSubmitted4 years ago(+2)

I like the idea of only allowing those who submitted rate. But allow anyone to comment.

You can minimise the abuse of creating blank /fake entries by ignoring ratings for users who do not receive a minimum number of rating themselves.

Of course this could be circumvented by creating enough fake accounts to raise each other above the minimum, but again countered by looking for voting islands. Depends how far you want to take the anti cheat measures. There will always be loopholes.

Introducing a Karma system that accounts for participation may help too - this has been discussed in a few other threads.
https://itch.io/t/527869/karma-system-for-the-game-jams
https://itch.io/jam/gmtk-2019/topic/527912/show-some-respect-to-other-participants

****

Also, I really hope Mark acknowledges and discusses these rating problems in a video, I'd be keen to understand his perspective as the event organiser.

sztrovacsekSubmitted4 years ago(+1)

Hello, it was my first jam, so the ranking and everything is totally new for me, but here are my ideas for future jams (I don't know if they are possible to do, or not, but I'll share them anyway :P)
- only participants should vote
- if it's still public voting, there should be an option to switch between the final rankings by the public votes, and the participant only votes
- only those should be able to vote, who registered before the jam started
- if you vote, you need to give written feedback as well
- your star rating should be visible under your written rating, to prevent those who write nice things in their comment, but give 1 star for everything
- the previous might be only visible for the person who gets the ratings, not everyone, and they could report it, if they think something is shady
- it should be mandatory to vote for at least 10 (or other amount of) games, since it's a community event, those who didn't do should be excluded from the final rankings
- at the end of the voting period, there should be an extra day to vote only for those games that have less than 10 (or other amount of) votes
- if a game still didn't get at least 10 (or other amount of) votes, than it should be excluded from the final rankings
- it's hard to check if someone really played a game or not, but people shouldn't be able to vote for a game until they actually download it or play the browser version

5youSubmitted4 years ago(+1)

Those are good aplicable ideas and it would make for a fair game jam.

But the "only participants should vote" rule would turn it into an even that is folded on itself. Not that it is bad, but it's almost a political choice at this point. And I think it is up to Mark Brown to make that choice as it is, after all, his jam.

Personaly I believe that getting "normal players" to play my games and discuss about it is more interesting than only doing it with devs. Ultimately games are meant to be played, for the players, getting to give feedback is fun and is an incentive to participate. It's more important than the contest that goes along with the jam.

sztrovacsekSubmitted4 years ago

Yes, I understand that, but an open voting has a chance to turn into a "who-has-a-bigger-following-base-on-random-social-media" contest. But even if they can't vote, they could still play the game, so you are able to get valuable feedback. And at the end of the day that's what matters anyway, that's why I like that there are no prizes , it's mainly for learning and get to know other devs.

5youSubmitted4 years ago (2 edits) (+1)

good point.

And I guess there's always gonna be some cheating in one form or another. But I doubt the few cheaters who might participate in game jams would end up being the talented geniuses who win anyways, as long as the actual best games win. So I'm not so sure chasing them is as relevent as focusing on electing the best games and ideas wich should be the games and ideas players love.

WylterSubmitted4 years ago(+5)

I'm sorry, but i think this approach is a bit wrong, because it doesn't emphatize the fact that games with more votes have a more precise score. It's like saying that every game that has more than 10 ratings has a 100% precise score.
I think a Bayesian approach would have given a better result.

For example, the formula for calculating the Top Rated 250 Titles in IMDb gives a true Bayesian estimate:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where: R = average for the movie (mean) = (Rating)

v = number of votes for the movie = (votes)

m = minimum votes required to be listed in the Top 250 (currently 25000)

C = the mean vote across the whole report

I believe that, with the same formula, if we substitute m with the median (in this case 10), we would get a more accurate estimate of what the game score should be.

PuffPastryPrinceSubmitted4 years ago(+1)

Here you go link

leafoAdmin4 years ago

I'm sorry, but i think this approach is a bit wrong, because it doesn't emphatize the fact that games with more votes have a more precise score.

Thanks for the feedback. The goals of IMDB ranking are a lot different than ranking jam entries. For a jam you have to be careful about giving an advantage to popular games because it makes it impossible to compete if you don't have access to an audience of people. IMDB doesn't really care about this issue, they are happy to show what is both popular and highly rated because it boosts conversion rate on their list.

That being said though, I think using Bayesian average is a good formula, and maybe will be added as an option for future game jams.

jlnprssnr4 years ago (1 edit)

Maybe you can also have a look at an algorithm like Wilson Score

5youSubmitted4 years ago (2 edits)

I think the voting period should have been longer. like 2 weeks at least and an arbitrary threshold of votings needed to not get your ranking penalised that leaves most ratings intact. With enough time and a homepage that, by default, presents games without enough ratings, it would be more fair.

Also every games are in the same jam even if some projects were much bigger than others. A separate result for soloers might help sorting all of this.

sztrovacsekSubmitted4 years ago

I agree with the separation between the solo devs and the teams, there can be massive differences, and really not fare to compare them to each other. Although, there is a room for cheating here (submitting as solo, when in reality it was mad by a team), but let's just hope everybody is being honest about it. :)