This jam is now over. It ran from 2022-09-30 16:00:00 to 2022-10-02 15:00:00. View results

Join GatherTown on here!

See how you can upload your project to the hackathon below:

See the example project code and report here: https://drive.google.com/drive/u/1/folders/10tTmTVK3T7s6_1Dgft4ZnkkzvrpUbIf5.

🧑‍🔬 Join us for this month's alignment jam!

Join this AI safety hackathon to compete in uncovering novel aspects of how language models work! This follows the "black box interpretability" agenda of Buck Shlegeris:

Interpretability research is sometimes described as neuroscience for ML models. Neuroscience is one approach to understanding how human brains work. But empirical psychology research is another approach. I think more people should engage in the analogous activity for language models: trying to figure out how they work just by looking at their behavior, rather than trying to understand their internals. Read more.

Instructions

We will work on the ideas that are represented in the project of black box investigation on aisafetyideas.com. See the ideas here: https://aisafetyideas.com/project/black-box-investigation. These will work as inspiration towards your research projects.

You will work in groups of 2-6 people within our hackathon GatherTown.

How to participate

Create a user on the itch.io (this) website and click participate. We will assume that you are going to participate and ask you to please cancel if you won't be part of the hackathon.

Submission

When the hackathon starts, you will be able to continually upload new PDF / Google Doc / Colab / Github repositories during the extent of the research competition.

Each submission will be evaluated by a group of judges on 1-10 scale for 4 different qualities.

Criterion	Weight	Description
Alignment	2	How good are your arguments for how this result informs the longterm alignment of large language models? How informative is the results for the field in general?
AI Psychology	1	Have you come up with something that might guide the “field” of AI Psychology in the future?
Novelty	1	Have the results not been seen before and are they surprising compared to what we expect?
Generality	1	Do your research results show a generalization of your hypothesis? E.g. if you expect language models to overvalue evidence in the prompt compared to in its training data, do you test more than just one or two different prompts? A top score might be a statistical testing of 200+ prompt examples.
Reproducibility	1	Are we able to easily reproduce the research and do we expect the results to reproduce? A high score here might be a high Generality and a well-documented Github repository that reruns all experiments.

Schedule

The schedule makes space for 46 hours of research jamming. You can decide your commitment level during the jam with your teammates but we encourage you to remember to sleep and eat.

See further details here:

Friday, 30th September: 6-7 PM CET \| 8-9 AM PST \| 9:30-10:30 PM IST	Introducing the alignment jam and black box investigation, cognitive psychology, and how to use the pre-configured code for the hackathon. Everyone splits into teams afterwards based on the teams we make or custom teams.
Friday, 30th September: 7 PM CET until Sunday, 2nd October: 4 PM CET \| 6 AM PST \| 7:30 IST	Everyone joins the GatherTown for as long as they're awake and works in groups within the space. We will walk around, answer questions, and help or give feedback on your projects.
Sunday, 2nd October: 4-8 PM CET \| 6-10 AM PST \| 7:30-11:30 PM IST	The judges begin judging the submissions. We expect you to have submitted your final submission at 4 PM CET but you will be able to continue submissions throughout the judging time.
Sunday, 2nd October: 8 PM CET \| 10 AM PST \| 11:30 PM IST	Presenting the winners and awarding the prizes with 50% for the top submission, 30% for the second place, 15% for the third place, and 5% for the fourth place. At the moment, the total prize pool is $1,000 but this might increase as we get closer to the hackathon.