The Interpretability Hackathon

Hosted by Esben Kran, Apart Research, Neel Nanda · #alignmentjam

Entries

217

Ratings

Overview Submissions Results

Community

Screenshots Submission feed

A jam submission

Probing Conceptual Knowledge on Solved GamesView game page

TCAV, RL, connect-four

Submitted by mentaleap — 6 hours, 1 minute before the deadline

Add to collection

Play game

Probing Conceptual Knowledge on Solved Games's itch.io page

Results

Criteria	Rank	Score*	Raw Score
ML Safety	#3	3.214	3.214
Judge's choice	#4	n/a	n/a
Reproducibility	#5	4.214	4.214
Novelty	#13	2.857	2.857
Interpretability	#14	2.929	2.929
Generality	#20	2.286	2.286

Ranked from 14 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Where are you participating from?
["Online"]

What are the names of your team member?
Amir Sarid, Bary Levy, Dan Barzilay, Edo Arad, Itay Yona, Joey Geralnik

What is your team name?
mentaleap

Comments

mentaleapDeveloper1 year ago

We explored how a Deep RL agent uses human interpretable concepts to solve connect-four.

Based on 'Acquisition of Chess Knowledge in AlphaZero' paper by DeepMind and Google Brain, we used TCAV to explore concepts detection in RL agent for connect four.

Our agent architecture was inspired by AlphaZero and trained using the OpenSpiel library by DeepMind.

Our novelty is in the decision to study connect four as it was solved with a knowledge based approach in 1988. Which means that to some extent we understand this game better than chess!

Like Reply

itch.io

The Interpretability Hackathon

Probing Conceptual Knowledge on Solved GamesView game page

Play game

Results

Leave a comment

Comments