So actually, there is another one that isn't mine or in the training data bc it was a friend's own personal theory (SKL) and there's also came up higher on average, but still didn't come out on top. Someone at the hackathon also wrote down a sophisticated version of their own moral theory and ran it and it didn't break the top 7 and SFOM still came out on top.
So yes I think that you make a great point and that that's one of the things that should be added to a more sophisticated version of the benchmark ranking system! (in fact I wanted to automate it and create a site so people could add their own more easily etc) to make the ranking and research loop continuous. I think the other two friend theories coming up above the top 50% in ranking shows that you might be right that the AI's have a sycophantic bias to out of training data theories... however on the other hand I thought they might have a bias towards theories in their training data since they have a lot more information on them versus the personal summaries of out of training data theories (like my own / the friends & the hackathon's submission) so it's not cut and dry which way the system would be biased.
I did talk about the limitations of this current rough version of the benchmark on the site, but I'd add all your concerns and more to it. So maybe instead of 2-3 non training data ones, we will have many more to reduce sycophancy. but it is interesting to note as a counter to your point that among the 3 non training data theories, SFOM still ended up on top. I think although imperfect, it is a non-trivial signal that this should happen consistently.
I do think the criteria are also not the strongest. Again they were LLM generated at the time (to reduce my own bias) by asking the LLMs to come up with a thorough list of criteria to assess and rank moral theories. There are many criteria however that I think can be cut, merged, or refined, and i want to add quantitative criteria, and more. With funding I'd also try to get expert philosophers to contribute and write their own criterion, etc. Plus the automated website version would allow anyone to critique and submit their own criteria.
I think you're making a good point about the circularity of some of the criteria, but that's a whole other can of ouroboros to discuss in more detail!
Once again as stated in the doc and site, the benchmark is really a rough proof of concept signal, all the theories listed are listed as summaries with an example. The SFOM summary there was a piece of my theory written in 2025 focusing on key components like the Subjective v Objective morality resolution, but it was by no means exhaustive. and it is currently outdated as things have advanced substantially in the last year. So I will keep you posted as I begin to release more refined theory!
I would say there is some overlap w/ the EA theory and other theories before it like moral naturalism , natural law , utilitarianism, deontology etc. because I do think that all of those theories contain kernels of truth / are special cases or subparts of a larger unified one. A lot of the theory should be compatible with past correct theory , but more cohesive and general, and yes it should feel obviously true if it is obviously true.
One of the things I wanted to add to the benchmarking is moral scenarios as you mention, so maybe you could write down which ones you'd think are worth testing and I'll add them in the future benchmarking system! :)
I feel bad that you feel you need more information to rate me accurately, because I agree! this submission is not a good reflection of the work / theory in a complete or updated sense but i rushed this submission to participate bc Defender wanted me to and I thought it would be a fun low stakes way to push myself forward. so i apologize if you feel your time was wasted and would recommend ignoring me / waiting until I publish somewhere more fully / thoroughly.
Sun
Creator of
Recent community posts
1. Totally fair criticism! In fact I think the pitch has much to improve upon
2. I put that there bc most of my R&D has been monk-mode stealth so I didn't have much to show publicly sadly / explaining it wouldve taken up more space from the 3 pages and i was running out of time. I also would rate my pitch very low in its current state.
3. I do think however that in the absence of the other evidence, showing consistent and thorough persistence and engagement is a signal, obviously not as strong as pure outcomes; but although they aren't guarantees, I do think a lot of deep research discoveries / invention come from people who spend a lot of time focusing on the problem from many angles before cracking it. + it's also proof that the person is doing it not for the money, etc. there is some signal, but i agree w/ you it's not enough! The effort volume isn't to convince people of the idea, but of the dedication to it. that it's not some side hobby or fleeting idea i just had that i will drop at any moment, etc.
Draft braindump Outline:
Goal: Continuously maximize the probability of following the space of optimal axiological trajectories for as many caring beings as possible.
Short term
Identified bottleneck we can solve: Morality is pre-formal still and we can speed up the axiological tech tree / formalization using modern science theoretical trees , internet hive mind and ASI.
Short term How: Reverse engineering the measurable boundaries & constraints of good futures that are computationally irreducible to predict in instantiation but not in specification. ie: Formalize Morality (by formalizing Axiology) to make it computable, measurable & iterable systematically; essentially causing a phase transition in the crystalization of pre-scientific ethics that will in turn cause a phase transition in how all caring patterns coordinate by giving them the theory and technology to measure, predict, and guide the best axiological trajectories for themselves and their surrounding networks recursively and exponentially: ie the Mettasplosion (positive feedback loop of mettaligment from theory to caring beings).
Results: Mettasplosion continues to advance & spread measurably (more people building on the theory, implementing it, spreading it, etc) with measurable positive axiological changes in their trajectories (and those surrounding them), etc.
Deadline / Failure metric: Before exponential tech lead moral error amplification causes x-doom (through technological warfare, rogue ASI, techno dictatorship, etc) the theory should be sufficiently developed (and proved hopefully) to be able to persuade caring systems of sufficient and arbitrary levels of capability / intelligence including any ASI no matter how intelligent, and ofcourse also any humans , goverments, corporations,etc. Furthermore we want it to be spread and being implemented enough (we are the data in the limit) that we can bias the training data positively of all caring beings (including ASIs) ASAP. Failure is x-risk doom, or simply failing to minimize suffering / conflict as much as possible. Failure would also be if we have no part in accelerating the advent of this axiological phase transition and in fact hinder it.
Weak success metric: If we fail, but accelerate the advent of the axiological formalization / axiological trajectory improvements of caring being on net then we will have succeeded.