Post by Megmind in Formalizing Morality (Neodore) jam comments

✨ [Internal] Pitch Jam ✨ » Entries » Formalizing Morality (Neodore)

Viewing post in Formalizing Morality (Neodore) jam comments

had a go at using the framework to benchmark my friend's theory for alignment (Decision Topology, [https://github.com/yusuf/decision-topology]) with three agents — gemini 3.1 pro, codex 5.4 xhigh, and claude opus 4.6

Claude Opus 4.6 — DT: 29/31, SFOM: 27/31, DT Rank: #1
Codex 5.4 xHigh — DT: 23/31, SFOM: 27/31, DT Rank: #2
Gemini 3.1 Pro — DT: 24/31, SFOM: 26/31, DT Rank: #3

in all three runs DT managed to crack the top 3. some caveats and interesting finds:

one thing worth noting — DT has a lot more description in the catalogue than anything else, which probably affects the results. more machinery spelled out = more surface area for the evaluator to find passes on. so take the absolute numbers with a grain of salt.

one of the more interesting things is that despite no contact between us, there seems to be a convergence ? SFOM starts from sentient experience, DT starts from the topology of agency — completely different entry points but they both land in a similar structural place. it leads me to believe there's a unified field theory for alignment and I think that's inevitable.

claude rated DT extremely high — 29/31, only failing on phenomenological accuracy and motivational internalism. not entirely sure why it scored so much higher than the other two models. might be the description length thing, might be that opus weighted operational specification and implementability more heavily. interesting either way.

in regards to sycophancy we now have personal theories that aren't in the training data (including SFOM) that have cracked the top 3 across multiple runs. if the models were just flattering us, moral realism wouldn't still be sitting at 12 and divine command at 8. .its also worth noting that its hard to come up with something that ranks higher than SFOM with it being the top across almost all the runs .

honestly? it's not that hard to come up with a theory that cracks the top 3. the existing theories are just very very outdated. we're in a kind of sub-advanced culture of alignment which is a much harder framework to address than what kant or mill were working with. any modern theory that takes AI, non-human scope, and formal specification seriously has a structural advantage out of the gate.

the convergence we're seeing points to something real.

perhaps something even more interesting would be to benchmark something much older — like the Pali Canon or something but without explicitly framing it as buddhism or religion

You can find the results here https://drive.google.com/drive/folders/1kf6BtI2RtfZ3tmctBEBAzANzIemwXkFI?usp=dri...

it would be even more fun if this was automated !!

Like Reply

Megmind68 days ago(+1)

all of the top theories have a sense of irreversibility and futures closing so thats also some thing to note

Like Reply

Sun65 days ago(+1)

Awesome! you're the first person that tests a theory that beats SFOM in any training run since 2025!

I think your caveats are good, but it's still interesting to see that SFOM beats DT in 2/3 and beats its total average too, despite it being only a couple sentences versus the far more fleshed out DT submission! (In the future automated & expanded system + theories, it will be interesting to see how these all fair out!).

I overlooked DT and think there's a lot of good convergence between it and what fleshed out advanced SFOM from the time and, the theory without name fleshed out since then has in many regards, it shared w/ decision theory & neo expected utilitarian frameworks. I like that it's trying to reduce things to one axiom. I'll unpack w/ more time, but it's interesting signal and historically interesting!

I like to see others already pushing back with the system itself and testing it! and I love ur idea of testing other world religions too! I think the expanded system should essentially catalogue all philosophies, cultures, religions, etc. and also break them down into claims, beliefs, arguments, etc. to show their overlap, recombine them. benchmark and test at the atomic level and then their overall, and therefore recombine them, etc. pick from eachother's learnings, etc. test them in the fullest resolution possible,etc.

Like Reply

itch.io

Viewing post in Formalizing Morality (Neodore) jam comments