Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines
A jam submission

In search of linguistic concepts: investigating BERT's context vectorsView project page

Investigating whether BERT's context vectors correspond to human-interpretable linguistic concepts
Submitted by roksanagow — 37 minutes, 6 seconds before the deadline
Add to collection

Play research

In search of linguistic concepts: investigating BERT's context vectors's itch.io page

Results

CriteriaRankScore*Raw Score
ML Safety#53.3333.333
Generality#53.0003.000
Novelty#112.6672.667
Mechanistic interpretability#113.3333.333
Reproducibility#142.0002.000

Ranked from 3 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback

Judge feedback is anonymous and shown in a random order.

  • This work is nicely done as a traditional machine learning task. However, using BERT visualization on the fine-tuned models may not be very useful. It would be beneficial to include more interpretability methods to support the conclusions and investigate fine-tuning, as this area is still under-studied.
  • There's actually been a fair amount of prior work on this kind of thing! Two relevant papers: https://arxiv.org/abs/1906.02715 https://arxiv.org/abs/1905.05950 More generally there's a whole subfield called BERTology on these kinds of questions: https://arxiv.org/abs/2002.12327 I think your motivation section is mostly false - as far I know, there's been very little interp work on vision transformers, and attention patterns are, if anything, easier to interpret for language models than image. There's been a fair of interp work on classic image models like ConvNets and ResNets. But generally we don't interpret image models by "averaging over" inputs, other techniques like feature visualization are used: https://distill.pub/2017/feature-visualization/ The actual method used here was fairly legit, and is analogous to what's known in the literature as probing, here's a review: https://arxiv.org/pdf/2102.12452.pdf It's generally easier to classify eg "anger vs not anger" than a 7 variable categorical problem like this, though you need to eg have the same number of anger and non anger data points (or scale the loss for the anger ones to get comparable gradients) I'm pretty surprised that a two layer BERT model could do such good fake news classification! Honestly this makes me suspect that the dataset is badly made or too easy. It wasn't clear to me where the two layer BERT model came from, was it part of BERTVis? I'm impressed that you managed to fine-tune a language model in a weekend hackathon! That's a fair amount of effort. - Neel

What are the full names of your participants?
Roksana Goworek, Paul Martin, Jonathan Frennert

What is your team name?
teamEd

What is you and your team's career stage?
UG students

Does anyone from your team want to work towards publishing this work later?

Yes

Where are you participating from?

Edinburgh

Leave a comment

Log in with itch.io to leave a comment.

Comments

Submitted(+1)

Loved the research question!! try have a look on TCAV and our results from the previous hackathon (where we looked for concepts in connect-four RL agent).

DeveloperSubmitted(+1)

Thank you very much! I will check those out! Btw, would you like to connect for potential future collaborations? Here is my LinkedIn if you do: https://www.linkedin.com/in/roksana-goworek-0b6072154