Calculated ratings for jams vs raw scores?

A topic by Jacic created Jan 01, 2020 Views: 1,716 Replies: 10

Viewing posts 1 to 4

Hi there, sorry if this has been discussed elsewhere (I couldn't find it under the search terms I was using.) Was just wondering what the rational behind penalizing jam entries with fewer votes is? Particularly for smaller jams, that means half the field is sometimes getting marked down for having a couple or sometimes only 1 vote less than the upper half of the field. I'm guessing it's in there to stop a game that has only 1-2 votes winning over others with many times more votes, but does that happen often? Anyway, not critisizing the system as I'm guessing there's a good reason for it, I'm more just curious as to why it's being run that way rather than raw scores?

Like Reply

leafoAdmin4 years ago(+2)

Because a project that gets one rating that happens to be a perfect score shouldn’t mean it’s the best game in the jam. Statistically speaking, you can say that the more ratings something has, the more accurate the rating is. Something with very few ratings is likely to be significantly off from what it would have gotten had everyone in the jam had the opportunity to rate it.

I’m guessing it’s in there to stop a game that has only 1-2 votes winning over others with many times more votes, but does that happen often?

Yes, it would pretty much happen in most jams since it’s common for people to ignore projects that they don’t think look good. It would be very upsetting to people who put in a lot of effort into their projects to get beaten by an ignored project that got 1 vote.

Another advantage of penalizing projects with very few ratings is that it encourages people to want to get more ratings, and the best way to do that is rating other people’s projects and leaving a comment. It helps build a better jam community.

The rating system skew on itch.io works by looking at the medium number of ratings for the entire jam, so it adjusts itself to how much people are rating in the jam, and doesn’t get skewed by a minority of projects that get a large number of ratings.

Particularly for smaller jams

I’m not sure how small you’re talking about, but if the entire jam only has a few ratings in total then at that point the final results are probably not valid in any case.

Hope that explains.

Like Reply

leafoAdmin moved this topic to Game Jams 4 years ago

Igor153 days ago (6 edits) (+1)

Another advantage of penalizing projects with very few ratings is that it encourages people to want to get more ratings

Sorry, not at all. How it really works is (from my experience in 2021):

- you have no idea that there is a penalty, and how many ratings are needed at least to not get the penalty

- you get a few ratings and think it's enough

- jam voting ends

- you get penalized

Encourage and punish are two opposite things.

Personally I would prefer the Ludum Dare system - constant minimum amount of ratings, and a clear message about this.

I’m not sure how small you’re talking about, but if the entire jam only has a few ratings in total then at that point the final results are probably not valid in any case.

And that's the argument for the Ludum Dare system.

Like Reply

Jacic152 days ago (1 edit)

Oh wow I posted this years ago but yeah the problem still stands. I entered a jam recently where the most popular entries were topping out at less than 30 votes (niche interest area running a friendly jam). I'm assuming in order for the lower half of the field not to get heavily penalised for having a few less votes than the top half, the people running it actually did their own manual calculations for the results release. It seemed unfair for an entry to slide a few places down in the rankings (and in one case affected who won a category) because of the difference of a few votes. Being a niche area means it is quite hard to get "lots of votes". More interaction can mean more votes which is a good thing, but when the difference can be in the single digits as to where the penalty line is, it's pretty disheartening for games that slide under that barrier. (and let's face it, about half the entries in a comp are going to go that way no matter how interactive everyone is or is not. My game was not one of those affected this time so that's not why I'm complaining, I just feel for some comp types it isn't very fair.)

In theory it's a good idea to stop an unpopular entry with a handful of perfect votes from the dev's friends winning over a more popular and better made one with 200 votes over a range of scores, but it just doesn't work for the smaller jams where if you get 20 votes your score might be unaffected, but if you get 19 you're penalised. An option like Igor suggested or to just work off raw scores rather than calculated ones would be good for those. I'm assuming not an option though considering it's been this way for at least the 3 years since I posted this thread.

Like Reply

redonihunter152 days ago

I think the most important argument here is, that not voting is counted as a vote of some sorts. People do can see game descriptions and thumbnails without ever playing the game. So those voting players not trying out the game is also to be considered.

It can be argued how exactly it should be weighted, but I have no doubt that it should be done, if one wants to call it "fair". Even for small numbers.

Making people vote on a minimum number of games, is a good idea for various reasons, among it dampening the effect of having your mum vote on your game and your game alone.

And I agree with what leafo said about small jams and vote numbers. Whatever raw votings you would have, would be inaccurate to begin with. It is like measuring the thickness of a sheet of paper with a ruler and only having 5 sheets to measure. You need to do tricks like folding the sheets to have any resemblance of an accurate result.

Like Reply

leafoAdmin150 days ago (4 edits)

Please read my response here as it gives some clarification: https://itch.io/post/8993127

The formula for jam ratings is adjustment = sqrt(min(median, votes_received) / median)

The adjustment for game that got 19 votes in a jam with median of 20 is = 0.97467, about a 2.5% reduction in average score, eg. a 4.5 rating would go to 4.386 for the purpose of comparison during ranking.

Like Reply

Bad Manners50 days ago(+1)

I don’t understand why the cutoff is at the median. No matter the game jam, 50% of submissions will always receive a score penalty. That number might be reasonable for something with 1000s of entries where score manipulation issues may be harder to deal with, but I don’t see why such a large range of submissions should always be pushed down just because they weren’t ranked enough times. Maybe it could be lowered to a 25th percentile for smaller jams - or better yet, selected by the jam host beforehand.

Like Reply

leafoAdmin150 days ago (1 edit) (+1)

As a submitter, the goal isn’t to avoid the “penalty”. The point of the “penalty” is to allow for a way to rank every game against every other game as fair as possible.

I think the mistake here is referring to it as a penalty, because it suggests to people they did something wrong with their submission. In reality, it’s more appropriate to call it an adjustment, as it’s serves as a mechanism to more fairly rank every game against every other game.

As an example, if your game got 10 votes with an average rating of 4.5, and someone else got 100 votes with an average rating of 4.5. Do you think it’s fair they rank above you? I think so, since they’ve managed to collect that many more votes while still maintaining a relatively high score.

If there is a fixed number minimum number of ratings required, then it’s going to create a distortion in rankings to the people who collect exactly that number of ratings. A lower number of ratings creates higher variance in the average score compared to a submission with a higher number of ratings. There is a chance the lower-number-of-ratings project has a non-representative score that could bump them above a game that got substantially more ratings.

you get a few ratings and think it’s enough

If you care about your results in a ranked jam, then I recommend participating in the voting process throughout the duration. Rate other people’s work and leave comments. It’s the best way allow others to discover your submission. If you’ve only collected a “few” ratings, then it’s unlikely your raw average score is representative in a way that can be compared to submissions that received substantially more ratings.

how many ratings are needed at least to not get the penalty

The goal is not to tell you to hit a target number of ratings, as that essentially encourages people to try to game the ranking system. Instead, your goal as a participant is to continue to collect ratings. The more ratings you collect, the more “accurate” your average score is relative the number of ratings collected by all other submissions.

Like Reply

Igor149 days ago

Thank you for the reply! I'm not sure about itch.io crowd, but in Ludum Dare there are Twitter superstars, which will of course collect 100 ratings, and since the ratings are from followers, I'd assume they're slightly in favor of the developer. I have no way to compete with them, so they are already "adjusted" higher, why also adjust me lower?

Like Reply

leafoAdmin149 days ago (2 edits)

The adjustment is based on median number of ratings. Superstars typically don’t represent the majority of the entries collecting votes. As an example, the largest jam we’ve hosted collecting over 160k total ratings, with many entries collecting hundreds and even over a thousand ratings, has a median number of ratings of 16.

https://itch.io/jam/gmtk-2023/results/top-marks

I’d assume they’re slightly in favor of the developer

This is debatable, the increase attention of a page may skew the voting bias of a rater to lower. In my experience, once a population size increases outside a core set of fans into general audience, you attract people who are more likely to casually rate things very low.

Like Reply

redonihunter149 days ago(+1)

Ahhh. The dangerous assumption that all followers on social media are fans and wish the account followed all the best.

Sometimes fans are the harshest critiques. After all, they know you could do better. Just look at those Marvel movie fans attacking some of their beloved Marvel movies.

The opposite is also true. There is a certain indie bias. I am searching for words here. If you see a nice game from an unknown dev that has not much ratings or none at all, some people are more inclined to give a slightly better rating than they would have given the same game from an extablished "superstar". Or they would feel bad to give a 1-2 star rating while not having such qualms concerning an established developer with a game that is "overrated" in their opinion.

Like Reply