The Mechanistic Interpretability Hackathon

Hosted by Esben Kran, Neel Nanda, Apart Research, Zaki, fbarez · #alignmentjam

Ratings

Overview Submissions Results Screenshots Submission feed

Results

15 entries were submitted between 2023-01-20 16:00:00 and 2023-01-23 03:15:00. 52 ratings were given to 15 entries (100.0%) between 2023-01-23 03:15:00 and 2023-01-25 14:00:00. The average number of ratings per game was 3.5 and the median was .

By criteriaJudge's choice ML Safety Mechanistic interpretability Novelty Generality Reproducibility

Criteria	Rank	Score*	Raw Score
ML Safety	#1	3.750	3.750
Novelty	#1	4.500	4.500
Generality	#5	3.000	3.000
Mechanistic interpretability	#5	4.250	4.250
Reproducibility	#5	4.000	4.000

Criteria	Rank	Score*	Raw Score
ML Safety	#2	3.674	4.500
Reproducibility	#13	2.041	2.500
Generality	#14	1.633	2.000
Mechanistic interpretability	#14	1.225	1.500
Novelty	#14	2.041	2.500

Criteria	Rank	Score*	Raw Score
Reproducibility	#1	4.400	4.400
Generality	#3	3.400	3.400
ML Safety	#7	3.200	3.200
Mechanistic interpretability	#8	3.800	3.800
Novelty	#12	2.600	2.600

Criteria	Rank	Score*	Raw Score
Judge's choice	#1	n/a	n/a
Reproducibility	#1	4.400	4.400
Mechanistic interpretability	#2	4.400	4.400
Novelty	#3	4.200	4.200
Generality	#11	2.800	2.800
ML Safety	#11	2.800	2.800

Criteria	Rank	Score*	Raw Score
Reproducibility	#12	2.449	3.000
Generality	#14	1.633	2.000
ML Safety	#14	1.633	2.000
Mechanistic interpretability	#15	0.816	1.000
Novelty	#15	0.816	1.000

Results

by Esben Kran, ElliotJDavies, h6

Ranked 1st in ML Safety with 4 ratings (Score: 3.750)

by fbarez

Ranked 2nd in ML Safety with 2 ratings (Score: 3.674)

by Giles, soy.cola

Ranked 3rd in ML Safety with 4 ratings (Score: 3.500)

by cmathw

Ranked 4th in ML Safety with 7 ratings (Score: 3.429)

by roksanagow

Ranked 5th in ML Safety with 3 ratings (Score: 3.333)

by mentaleap

Ranked 6th in ML Safety with 4 ratings (Score: 3.250)

by StefanHex

Ranked 7th in ML Safety with 5 ratings (Score: 3.200)

by chris-lons, victorlf4

Ranked 8th in ML Safety with 3 ratings (Score: 3.000)

by jakub151

Ranked 9th in ML Safety with 2 ratings (Score: 2.858)

by Yoann Poupart

Ranked 9th in ML Safety with 2 ratings (Score: 2.858)

by clementneo

Ranked 11th in ML Safety with 5 ratings (Score: 2.800)

by lomichelle42

Ranked 12th in ML Safety with 4 ratings (Score: 2.750)

by MatthewBaggins

Ranked 13th in ML Safety with 3 ratings (Score: 2.667)

by Al-Hitawi Mohammed

Ranked 14th in ML Safety with 2 ratings (Score: 1.633)

by roksanagow

Ranked 14th in ML Safety with 2 ratings (Score: 1.633)