AI Testing Hackathon

Ratings

A jam submission

This Is Fine(-tuning): A benchmark testing LLMs robustness against bad fine-tuning dataView game page

Submitted by JanWehner — 18 minutes, 6 seconds before the deadline

Ranked from 3 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback is anonymous.

An interesting approach to measuring adversarial data impacts! It’s probably hard to generalize this without creating a new benchmark per task but thinking more about the general direction of performance falloff is very encouraged.

Where did you participate?
Delft

What are the full names of the participants?
Jan Wehner, Joep Storm, Tijmen van Graft, Jaouad Hidayat

What is your team name?
Alignment Avengers

No one has posted a comment yet