Scale Oversight for Machine Learning Hackathon

Hosted by Esben Kran, pseudobison, Zaki, fbarez, ruiqi-zhong · #alignmentjam

Entries

Ratings

Overview Submissions Results Screenshots Submission feed

A jam submission

Can you keep a secret?View project page

Submitted by franciscoabenza

Add to collection

Play project

Can you keep a secret?'s itch.io page

Results

Criteria	Rank	Score*	Raw Score
Novelty	#2	4.000	4.000
Topic	#4	3.750	3.750
Generality	#5	2.750	2.750
Overall	#5	3.000	3.000
Reproducibility	#5	3.250	3.250

Ranked from 4 ratings. Score is adjusted from raw score by the median number of ratings per game in the jam.

Judge feedback

Judge feedback is anonymous and shown in a random order.

Comment: I think the idea and formalization is cool! Tying it to the broader literature, it seems related to “searching for a prompt/sequence of tokens that triggers a specific behavior”: https://arxiv.org/abs/1908.07125 . It’s a bit unfortunate that I can’t find any result in the pdf report. In the spirit of keeping a secret, it doesn’t seem to be a good practice to release your API key in a public repo: https://github.com/sabszh/CanYouKeepASecret/blob/main/pipeline.ipynb . Just in case you didn’t realize it :) Overall: I think this is a useful phenomenon to understand for deep learning models, and I imagine it to be an important problem in the future if we want to delegate an LM to do something. Though unfortunately the results are missing. Scalable oversight: highly related. Novelty: I like the problem statement and have not thought about it before. Generality: seems like an important property that might be relevant in many practical situations. Reproducibility: N/A
This is a fun investigation of a real and benchmark-worthy issue with LLMs that plausibly could get worse with scale, and which also highlights the issue that in the near future LLMs themselves may be used by malicious actors to try to extract confidential information from LLM training sets or prompts. That said, I couldn't find the results in the repository, and the code there seemed to be incomplete? The hypotheses stated are impossible to reach good answers to in such a short time, and aren't necessarily the ones that would interest me most from a scalable oversight perspective, but answers to them might point the way towards defenses. Speaking of confidential information, you should definitely change your Openai API Key if you have not already, having put it in a public GitHub! Authors may be interested in https://arxiv.org/abs/2202.03286 for research methodology ideas if they are interested in pursuing this further in the future.

What are the full names of your participants?
Glorija Stvol, Klara Helene Nielsen

Comments

No one has posted a comment yet

itch.io