nice! It’s great to see steps towards interpretability of multi modal models
Play project
Interpreting Catastrophic Failure Modes in OpenAI’s Whisper's itch.io pageResults
Criteria | Rank | Score* | Raw Score |
Novelty | #1 | 3.889 | 3.889 |
ML Safety | #2 | 3.222 | 3.222 |
Generality | #4 | 3.222 | 3.222 |
Interpretability | #5 | 3.556 | 3.556 |
Reproducibility | #11 | 3.667 | 3.667 |
Ranked from 9 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.
Judge feedback
Judge feedback is anonymous.
- Cool work! I'm pleasantly surprised that the logit lens works here, and that you can remove so many encoder + decoder layers, and interesting choice of problem. And cool use of PySvelte! My guess is that this failure comes from induction heads which notice and respond to repeated patterns, so brief hiccups turn into robust repeated sequences. Looking at which heads are most key in this behaviour would feel interesting to me. Misc point - I believe that GPT-3 can also get caught repeating the same word (probs downstream of induction heads)
Where are you participating from?
London, UK
What are the names of your team member?
Edward Rees, John Hughes, Ellena Reid
What are the email addresses of all your team members?
edward.r.rees@gmail.com
Leave a comment
Log in with itch.io to leave a comment.