Play project
Attention Phrenology: A spatial classification of attention heads's itch.io pageResults
Criteria | Rank | Score* | Raw Score |
Novelty | #1 | 4.500 | 4.500 |
Generality | #2 | 4.000 | 4.000 |
ML Safety | #3 | 3.500 | 3.500 |
Mechanistic interpretability | #7 | 4.000 | 4.000 |
Reproducibility | #10 | 3.500 | 3.500 |
Ranked from 4 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.
Judge feedback
Judge feedback is anonymous and shown in a random order.
- Funky. I find this hard to evaluate, since the vibe of the project is very much "do a massive scatter-shot approach of trying a ton of things". I think many of the things you tried seem broadly reasonable (at least, in the sense of, if you tried them and got cool results then I'm at least mildly interested). I think the project would have been stronger if you spent longer digging into any one result and seeing what you could learn, as is my vibe is something like "you had a lot of creativity, the result that component 0 is self-attention and component 1 is prev-token attention is cool, but I otherwise don't really learn anything that feels substantial from this". But you also managed to try an impressive amount of ideas! Idk. The project would have been cleaner if you'd done a 1L Attention-Only model instead of 1L with MLPs, and it's less clear to me that attention heads are interesting on 16 token prompts. -Neel
- This is very creative work and seems like a promising direction for investigating mechanistic interpretability in relation to inter-model activation and circuit differences. I would be interested in seeing this work expanded with Arthur Conmy's Automatic Circuit Identifier (https://arthurconmy.github.io/automatic_circuit_discovery/) to identify how similar circuits differ between your nine models. Really exciting work!
What are the full names of your participants?
Giles Edkins, Keira Wiechecki
What is your team name?
Phrenologists
Does anyone from your team want to work towards publishing this work later?
No
Where are you participating from?
Online
Leave a comment
Log in with itch.io to leave a comment.
Comments
No one has posted a comment yet