Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
A jam submission

Automated Identification of Potential Feature NeuronsView project page

Submitted by lomichelle42 — 8 hours, 34 minutes before the deadline
Add to collection

Play project

Automated Identification of Potential Feature Neurons's page


CriteriaRankScore*Raw Score
Judge's choice#3n/an/a
Mechanistic interpretability#54.2504.250
ML Safety#122.7502.750

Ranked from 4 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback

Judge feedback is anonymous and shown in a random order.

  • This is a wonderful project and plays right into mechanistic interpretability! This novel 3-step method is great for making neurons more interpretable and it enables quite a bit of deeper analysis. I recommend also reading Alex Foote's winning submission for the last interpretability hackathon which echoes some of your comments at the end: Great work!
  • Cool project! I'm excited to see Neuroscope being used like this (and I'm sorry you had to scrape the data - I need to get round to making the dataset available!) I liked the creativity and diversity of your methods, and like the spirit of trying to automate things! Using GPT-3 and FastText are cool ideas. My main criticisms are that I think these descriptions tend to not be specific enough and miss nuance, eg neuron 134 in layer 6 of solu-8l-pile is actually a neuron that activates on the 1 in Page: 1 in a specific document format in the pile, and seems way more specific than the description given! I also think that tokenization is a massive pain, that breaks up the semantic meaning of words into semi-arbitrary tokens, and I don't see how your method engages with that properly - it seems like it mostly doesn't involve the surrounding context of the word? I really liked the idea of substituting in synonym tokens for the current token, I'd love to see that done for the 5 tokens before the current token, and to try to figure out if we can find "similar tokens" in a principled way, when the token is not just a word/clear conceptual unit. But yeah, overall, nice work!

What are the full names of your participants?
Michelle Wai Man Lo

Does anyone from your team want to work towards publishing this work later?


Where are you participating from?


Leave a comment

Log in with to leave a comment.


No one has posted a comment yet