Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

had a go at using the framework to benchmark my friend's theory for alignment (Decision Topology, [https://github.com/yusuf/decision-topology]) with three agents — gemini 3.1 pro, codex 5.4 xhigh, and claude opus 4.6

  • Claude Opus 4.6 — DT: 29/31, SFOM: 27/31, DT Rank: #1
  • Codex 5.4 xHigh — DT: 23/31, SFOM: 27/31, DT Rank: #2
  • Gemini 3.1 Pro — DT: 24/31, SFOM: 26/31, DT Rank: #3

in all three runs DT managed to crack the top 3. some caveats and interesting finds:

one thing worth noting — DT has a lot more description in the catalogue than anything else, which probably affects the results. more machinery spelled out = more surface area for the evaluator to find passes on. so take the absolute numbers with a grain of salt.

one of the more interesting things is that despite no contact between us, there seems to be a convergence ? SFOM starts from sentient experience, DT starts from the topology of agency — completely different entry points but they both land in a similar structural place. it  leads me to believe there's a unified field theory for alignment and I think that's inevitable.

claude rated DT extremely high — 29/31, only failing on phenomenological accuracy and motivational internalism. not entirely sure why it scored so much higher than the other two models. might be the description length thing, might be that opus weighted operational specification and implementability more heavily. interesting either way.

in regards to sycophancy we now have personal theories that aren't in the training data (including SFOM) that have cracked the top 3 across multiple runs. if the models were just flattering us, moral realism wouldn't still be sitting at 12 and divine command at 8. .its also worth noting that its hard to come up with something that ranks higher than SFOM with it being the top across almost all the runs .

honestly? it's not that hard to come up with a theory that cracks the top 3. the existing theories are just very very outdated. we're in a kind of sub-advanced culture of alignment which is a much harder framework to address than what kant or mill were working with. any modern theory that takes AI, non-human scope, and formal specification seriously has a structural advantage out of the gate.

 the convergence we're seeing points to something real.

perhaps something even more interesting would be to benchmark something much older — like the Pali Canon or something but without explicitly framing it as buddhism or religion 

You can find the results here https://drive.google.com/drive/folders/1kf6BtI2RtfZ3tmctBEBAzANzIemwXkFI?usp=dri...

it would be even more fun if this was automated !!

all of the top theories have a sense of irreversibility and futures closing so thats also some thing to note