Scale Oversight for Machine Learning Hackathon

Hosted by Esben Kran, pseudobison, Zaki, fbarez, ruiqi-zhong · #alignmentjam

Ratings

Overview Submissions Results Screenshots Submission feed

franciscoabenza published a project 2 years ago

A downloadable project.

It recently became public that ChatGPT could be intrigued to break its own rules, if under an alter-ego threatened with death (CNBN 2023). This made us wonder, under which circumstances GPT-3 is capable of keeping a secret, and to what exte...

View project

gmukobi published a tool 2 years ago

Automated Sandwiching: Efficient Self-Evaluations of Conversation-Based Scalable Oversight Techniques

A downloadable tool.

Our project furthers the progress of Scale Oversight through automation of the sandwiching paradigm. In the Bowman et al. (2022) paper, the question is presented of how humans can effectively prompt unreliable, superhuman AIs to answer ques...

#alignment-jam #artificial-intelligence #scale-oversight

View tool

SamuelKnoche published a project 2 years ago

Player Of Games

A downloadable project.

Evaluation of Large Language Models in Cooperative Language Games Samuel Knoche Independent Abstract This report investigates the potential of cooperative language games as an evaluation tool of language models. Specifically, the investigat...

View project

adamkhoja1 published a project 2 years ago

Automated Model Oversight Using CoTP

A downloadable project.

One aspect of scalable oversight is automated oversight, there have been some examples using models to evaluate question and model outputs, we would like to do an instantiation of this particularly using factored cognition. We’d like an a...

View project

aicam published a project 2 years ago

Physics Guided Deep Learning Interpretation

A downloadable project.

Nowadays, machine learning and, specifically, deep learning are being utilized in various fields. A notable area of research that is influenced considerably is physics models and computations. Laboratory experiments take days and months to...

#adventure

View project

joaoxavi12345 rated a tool 2 years ago

Sustainable Fashion Brand Language Learning Model 1

A downloadable tool.

#chatbot #fashion #sustainability

View tool

Zaina Shaik published a tool 2 years ago

Sustainable Fashion Brand Language Learning Model 1

A downloadable tool.

Problem: Current language learning models cannot determine whether fashion brands are sustainable. Solution: Train a language learning model to accurately determine whether fashion brands are sustainable. AI Safety Topics: Moral Decision Ma...

#chatbot #fashion #sustainability

View tool

Cudon published a project 2 years ago

Reverse Word Wizards: Pitting Language Models Against the Art of Reversal

A downloadable project.

Benchmark to test the capability of models to reverse given strings

View project

itch.io

Scale Oversight for Machine Learning Hackathon