Scale Oversight for Machine Learning Hackathon

Ratings

A jam submission

Player Of GamesView project page

Submitted by SamuelKnoche — 46 minutes, 33 seconds before the deadline

Ranked from 3 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback is anonymous and shown in a random order.

This is an impressively thorough (given the 48-hour time limit!) investigation of a debate-like task. The proposed class of benchmarks for LLM, using cooperative, asymmetric-information language games is an excellent proposal and this is a solid proof of concept to demonstrate the idea. Quantitative and qualitative investigation of LM performance on such benchmarks could help give insights into issues that schemes like debate may run into in practice and whether there is anything that can be done to patch issues that may arise.
Looks like a really cool project! Some extensions that might be interesting: Would it be possible to plan ahead, either by recursively calling the LM itself, or use chain-of-thoughts to perform planning? It might be interesting to look beyond 2 words, which seem to be quite simple for humans (e.g., I usually start to struggle only when I want to get at 3 words) Overall: I think this is a useful phenomenon to understand for deep learning models, though it seems loosely related to scalable oversight. Scalable oversight: loosely related. Novelty: seems like an interesting to property to understand for deep learning models. Generality: focuses on a specific capability (theory of mind) of the model. Reproducibility: seems reproducible.

What are the full names of your participants?
Samuel KNOCHE

What is your team name?
Player of Games

No one has posted a comment yet