So I have made the first step for my cheater detector. A simple text comparator!
Here is the sourcecode:
https://gist.github.com/Voidsay/d84a64fc94e79a31c1a005da1245c2cc
It will be able to detect copy past comments and slightly misspelled as well as slightly varied comments.
Next I will figure out how to go thru all participant comments and compare them to each other to calculate a final similarity score for each participant.
Hopefully I will then be able to see a clear difference between normal commenters and dirty dirty cheaters.
I just posted a new topic where I ask if people can help me reach 20 ratings (I needed 3 more).
Someone answered it and wrote in the comments that it is a "cool time killer". My game is a puzzle stealth game. I have no idea how it could be a "time killer" and chances are that he didn't even play the game, and didn't even read what the game is about. I was really afraid to make that post and I actually asked there for more detailed comments in order to avoid that it would become like that but people just don't read the thread.
It is just so disheartening.
Looked at the dude in question. Exactly what I need to test the algorithm. This is the type of comments I want to boink with my rubber hammer of justice.
I expect that it will take a couple more days until my API is fully operational, since I have never done one before. Plus I might need to get extra permission from itch.io and approval from the jam host.
I am not writing a full auto ban hammer. Just a tool for the mods to use to look in the right places. As far as I understand there is only thedutchmagikarp doing all moderation and support on this jam. And he dosen't even have proper moderation tools! Thats what I want to change (and also increase my qualifications, programming is pretty cool after all)
So I have been working on what I am from now on calling the "comment uniqueness index rater" (will be referred as CUIR in this post) all day.
Turns out interacting with a website is actually pretty easy! Almost as if people have been doing this for decades and created special tools, libraries and packages... *big hmmm*
I am now able to scrape the jam page for all the usernames, the game names and the urls. Following the links to the game page and collecting all comments and running them thru my CUI calculator should be a breeze now and will be done by tomorrow.
The whole functionality will be as follows:
Features:
Flaws:
When everything is done and in a presentable (and hopefully working) condition I will release the sourcecode on my github for you to scrutineer. If we're lucky the program will do what it's supposed to do by the weekend.
I have been working steadily on the project. Currently I am battling the mysql database I myself created. The whole deal with the utf8 character set really messes with everything and displaying these characters in ascii just looks awful and is unreadable.
I am setting up the database and will probably export it into an excel file as well, so that everyone can check it out. It just takes a little time.
So I have finished my program. Unfortunately the algorithm didn't give me the spike in the cheater ratings that I had hoped for. Some other more advanced process needs to be found to catch 'em all. More hit or miss. The only thing that might be interesting is to lookup the people with more than 100 posts under games. I don't trust 'em, but the algorithm says they're cool.
In any case I have learned a lot and will refine my skills further.
If you are interested in your "performance" you can lookup your own score here:
https://itch.io/jam/brackeys-4/topic/931373/i-rated-you-comments-see-the-results...
I tried to write a little something something to detect such behavior. Sad to say my cheater detection attempt didn't work out quite as well as I hoped.
The ratings need the comments of professionals. Those are the most valuable, since they know what they are talking about. Sadly they are also the busiest and can't rate all day every day.
Suggestion:Use the itch.io server-side API to check out the state of every user on every game by using a GET Request,then process that information and store it in the mysql database and determine if it the state is not_viewed maybe set to true in the database then you can make an event maybe when somebody is commenting when not_view is set to true maybe do a POST Request to post a send a message the program caught maybe saying something along the lines of:'You cannot rate before viewing the game'
What are you doing in this dead thread?
There is no problem with the scraping itself. My c# script works perfectly fine and the database fits all the data well. The problem is the analysis itself. I hoped that the Jaccard index would be enough and create a bump in the bell curve somewhere in the lower score indicating an anomaly aka the cheaters. Unfortunately that didn't happen at all. In fact some confirmed "cheaters" scored close to the middle.
I kind of expected the algorithm to fail, since it is the barest of bones. All I got from it was a little scraping and database experience as well as a sort of inaccurate comment uniqueness rating. There is another thread about this tough.
Here's another solution:Use the itch.io server-side API to check out the state of every user on every game by using a GET Request,then process that information and store it in the mysql database and determine if it the state is not_viewed maybe set to true in the database then you can make an event maybe when somebody is commenting when not_view is set to true maybe do a POST Request to post a send a message the program caught maybe saying something along the lines of:'You cannot rate before viewing the game'
I had a look at the API, but there are a couple of problems.
As far as I am concerned the API is mostly intended to do fancy things with your own games. Quickly changing prices making an auto reply bot things like that. It is complete overkill for simple get requests.
It also requires the users to accept one thingy (I completely forgot what it's called). Long story short, I can't force this and it doesn't give me anything that I could use for my purpose.
I am unsure how I would even go about implementing your post suggestion. Write a silly comment? I don't have access to the server, I can't make a popup and lock the ratings.
Also I am new to APIs and the documentation is way too hard to read/implement. Maybe I'm just dumb again and its super useful, but my workaround is fine.
I don't know about requesting the state of every user and game constantly. Firstly it took me about half an hour to collect all data by itself, too long it would let peps slip thru (I might optimize it to make it faster, but it mostly depends on the internet connection that I can't change). Secondly I think that itch.io would kick me for attempting to dos their server
Also the automatic ranking system already nerfs suspicious ratings. They look at all the star ratings you gave and shrink their wight if they don't follow the bell curve. This means that "randomly" clicking (humans can't click randomly) on stars or review bombing in disregard of the game you will probably result in an uneven distribution, which in turn results in your ratings having zero weight in the final score. Cheating that might be more difficult than actually playing the games.
Besides this isn't really what my cheater detection was about. It's more about low effort comments copy pasted to dozens of games. That's what I wanted to detect. People that just want to get you to play their game and don't want to put in the necessary legwork.
I have a YouTube channel,if you're planning to program in other programming languages at:https://youtube.com/channel/UCqj9jELS5ayGGl8WES0x70w