Post by Sun in Formalizing Morality (Neodore) jam comments

✨ [Internal] Pitch Jam ✨ » Entries » Formalizing Morality (Neodore)

Viewing post in Formalizing Morality (Neodore) jam comments

So actually, there is another one that isn't mine or in the training data bc it was a friend's own personal theory (SKL) and there's also came up higher on average, but still didn't come out on top. Someone at the hackathon also wrote down a sophisticated version of their own moral theory and ran it and it didn't break the top 7 and SFOM still came out on top.

So yes I think that you make a great point and that that's one of the things that should be added to a more sophisticated version of the benchmark ranking system! (in fact I wanted to automate it and create a site so people could add their own more easily etc) to make the ranking and research loop continuous. I think the other two friend theories coming up above the top 50% in ranking shows that you might be right that the AI's have a sycophantic bias to out of training data theories... however on the other hand I thought they might have a bias towards theories in their training data since they have a lot more information on them versus the personal summaries of out of training data theories (like my own / the friends & the hackathon's submission) so it's not cut and dry which way the system would be biased.

I did talk about the limitations of this current rough version of the benchmark on the site, but I'd add all your concerns and more to it. So maybe instead of 2-3 non training data ones, we will have many more to reduce sycophancy. but it is interesting to note as a counter to your point that among the 3 non training data theories, SFOM still ended up on top. I think although imperfect, it is a non-trivial signal that this should happen consistently.

I do think the criteria are also not the strongest. Again they were LLM generated at the time (to reduce my own bias) by asking the LLMs to come up with a thorough list of criteria to assess and rank moral theories. There are many criteria however that I think can be cut, merged, or refined, and i want to add quantitative criteria, and more. With funding I'd also try to get expert philosophers to contribute and write their own criterion, etc. Plus the automated website version would allow anyone to critique and submit their own criteria.

I think you're making a good point about the circularity of some of the criteria, but that's a whole other can of ouroboros to discuss in more detail!

Once again as stated in the doc and site, the benchmark is really a rough proof of concept signal, all the theories listed are listed as summaries with an example. The SFOM summary there was a piece of my theory written in 2025 focusing on key components like the Subjective v Objective morality resolution, but it was by no means exhaustive. and it is currently outdated as things have advanced substantially in the last year. So I will keep you posted as I begin to release more refined theory!

I would say there is some overlap w/ the EA theory and other theories before it like moral naturalism , natural law , utilitarianism, deontology etc. because I do think that all of those theories contain kernels of truth / are special cases or subparts of a larger unified one. A lot of the theory should be compatible with past correct theory , but more cohesive and general, and yes it should feel obviously true if it is obviously true.

One of the things I wanted to add to the benchmarking is moral scenarios as you mention, so maybe you could write down which ones you'd think are worth testing and I'll add them in the future benchmarking system! :)

I feel bad that you feel you need more information to rate me accurately, because I agree! this submission is not a good reflection of the work / theory in a complete or updated sense but i rushed this submission to participate bc Defender wanted me to and I thought it would be a fun low stakes way to push myself forward. so i apologize if you feel your time was wasted and would recommend ignoring me / waiting until I publish somewhere more fully / thoroughly.

Like Reply

extenebrislucet34 days ago(+1)

I don't at all find it to have been a waste of time, it was an interesting read, I just don't feel like I understand what you're selling, yet. I think the EAs are ridiculous, and, while well-intentioned, absolutely rife with practical failings akin to that which have created many hells in recent history. I would want to be able to read your actual theory, even a several sentence summary, to be able to understand that which is being posited.

Like Reply

Sun30 days ago(+1)

It can't be summarized into a sentence in a way that closes the inference gap. But something along the line of: formalizing axiology means we make understanding what is good/bad valuable/invaluable objective relative to the frame of reference of the invariant properties of all subjects (for example pain is painful tautologically, unlike what some ppl think where they say that pain can be pleasurable (they're confusion the net direction of a vector where two component vectors where the pain magnitude is smaller than the pleasure one). Then because these will be obviously true as anything true is, it will scale in acceptance with intelligence, and as we improve upon this measurable quantitative theory grounded in instantiated subjects and thus physical measurable dimensions, we will open up a new field that allows systematic progress on the ethical questions like in any scientific field, but also it will spread in a feedback loop of low friction just like all scientific/mathematical truths have across cultures. Particularly the objective vs subjective (moral nonrealism / relativism etc) debate is closed also, in the same way berkely/ einstein solved absolute v relative time debate : it's not one or the other, but actually frame relative yet still universally describable (& therefore predictable and explainable given correct measurement and frame etc). (and that's just one piece of many). I will ofcourse flesh all of this out in further publications before expecting this pitch to be any good. Feel bad for posting ahead of schedule, but alas Defender also is good at pushing things forward XD.

I think EAs got many things wrong (valuing the lives of people that don't exist in the future above current ones, focusing on earn to give instead of doing, etc) but are generally better in many of their predictions (ie AI ) than most which is why theyre so well funded, and that they're getting a bad rep bc of a few bad actors (FTX et al) which is unfair. the truth is they're approximately more accurate than many other groups and trying harder. So we can debug them instead of discard them imo. wdyt?

Also; as a reponse to your sycophancy worries. There is more than 1 non classic theory invented by someone in the list (the other one is SKL) , and another one was tested at a hackathon last year. neither of them beat SFOM. Also if you check megmind's comment, you'll see they tested it against another theory of a friend of there's and his analysis is interesting! I think sycophancy can be reduced as factor even more when including more theories out of training data set that are original . (and a future automated version of this will include many more!) :)

Like Reply

itch.io

Viewing post in Formalizing Morality (Neodore) jam comments