Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines
A jam submission

Automated Model Oversight Using CoTPView project page

Submitted by adamkhoja1 — 47 minutes, 57 seconds before the deadline
Add to collection

Play project

Automated Model Oversight Using CoTP's itch.io page

Results

CriteriaRankScore*Raw Score
Reproducibility#24.5004.500
Topic#24.0004.000
Generality#33.5003.500
Overall#33.5003.500
Novelty#53.0003.000

Ranked from 4 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback

Judge feedback is anonymous and shown in a random order.

  • It's a cool project to use chain of thoughts and step-by-step reasoning to improve the model's capability to identify harmful outputs. More concrete examples and standard metrics (e.g., precision/recall) would help me understand the results better.
  • This is a solid attempt to us LMs to oversee LMs and definitely in the spirit of the challenge. It could have benefited from a clearer articulation of what exaactly the research question was, although implicitly it seemed to be "does using CoT enable better automated evaluation for this task and dataset". The conclusions that can be drawn are somewhat limited as the imbalanced and small dataset does not allow for statistically significant comparisons, but it seems like it was valuable to the authors as a learning pilot to help identify issues and considerations that could help prepare for more substantial research of this kind in the future.

What are the full names of your participants?
Adam Khoja, Rishi Khare, John Wang

Leave a comment

Log in with itch.io to leave a comment.

Comments

No one has posted a comment yet