all of the top theories have a sense of irreversibility and futures closing so thats also some thing to note
Megmind
Creator of
Recent community posts
had a go at using the framework to benchmark my friend's theory for alignment (Decision Topology, [https://github.com/yusuf/decision-topology]) with three agents — gemini 3.1 pro, codex 5.4 xhigh, and claude opus 4.6
- Claude Opus 4.6 — DT: 29/31, SFOM: 27/31, DT Rank: #1
- Codex 5.4 xHigh — DT: 23/31, SFOM: 27/31, DT Rank: #2
- Gemini 3.1 Pro — DT: 24/31, SFOM: 26/31, DT Rank: #3
in all three runs DT managed to crack the top 3. some caveats and interesting finds:
one thing worth noting — DT has a lot more description in the catalogue than anything else, which probably affects the results. more machinery spelled out = more surface area for the evaluator to find passes on. so take the absolute numbers with a grain of salt.
one of the more interesting things is that despite no contact between us, there seems to be a convergence ? SFOM starts from sentient experience, DT starts from the topology of agency — completely different entry points but they both land in a similar structural place. it leads me to believe there's a unified field theory for alignment and I think that's inevitable.
claude rated DT extremely high — 29/31, only failing on phenomenological accuracy and motivational internalism. not entirely sure why it scored so much higher than the other two models. might be the description length thing, might be that opus weighted operational specification and implementability more heavily. interesting either way.
in regards to sycophancy we now have personal theories that aren't in the training data (including SFOM) that have cracked the top 3 across multiple runs. if the models were just flattering us, moral realism wouldn't still be sitting at 12 and divine command at 8. .its also worth noting that its hard to come up with something that ranks higher than SFOM with it being the top across almost all the runs .
honestly? it's not that hard to come up with a theory that cracks the top 3. the existing theories are just very very outdated. we're in a kind of sub-advanced culture of alignment which is a much harder framework to address than what kant or mill were working with. any modern theory that takes AI, non-human scope, and formal specification seriously has a structural advantage out of the gate.
the convergence we're seeing points to something real.
perhaps something even more interesting would be to benchmark something much older — like the Pali Canon or something but without explicitly framing it as buddhism or religion
You can find the results here https://drive.google.com/drive/folders/1kf6BtI2RtfZ3tmctBEBAzANzIemwXkFI?usp=dri...
it would be even more fun if this was automated !!
thanks for this
the pitch was a compressed version of something more developed. i have a full white paper(https://docs.google.com/document/d/1A24witYwcnOkOR6rEckL02KNrUr3CeEsXfO_btDibKg/edit?usp=sharing) that addresses most of what you flagged — the five layer architecture is fully specified, the training corpus and objective are laid out properly, and the case for training from scratch is built out in detail. attaching it.
on narrative coherence as a loss function the skandha pipeline itself defines what coherence means. a continuation is coherent if each layer follows plausibly from the one beneath it: sensation from form, perception from sensation, mental formation from perception, consciousness from all three. coherence isn't a global judgment about whether output sounds natural, it's a structural property checkable at each layer boundary. when the model gets it wrong you can locate exactly where it failed. that makes it a tighter and more learnable objective than it might have seemed from the one sentence in the pitch.
on adjacent work , ACM is the closest thing i've seen built. layered architecture, consciousness window, emotional valence variables. but they're layering consciousness onto qwen2-VL and whisper, models that already know they're AI systems. the consciousness module sits on top of that. mine requires the opposite — the model's ignorance of its own nature has to be structural, established at pretraining. you can't fine tune that out. ACM also drives behavior through emotional homeostasis, equilibrium seeking. mine is directional ,something being pursued, something being avoided, a story in motion. different internal logic.
on GWT ,it's the theoretical ancestor of what ACM is implementing. the spotlight metaphor, streams competing for conscious access. my consciousness window looks similar on the surface but in GWT things outside the spotlight exist and are competing to enter. in my architecture things below the attention threshold don't exist as experience at all. that's closer to how the skandhas actually describe perception and it's why what i'm building is better understood as a mirror for observing ego structure than a simulation of awareness.
on openclaw ,yes, going to build the harness implementation first. someone else in the comments recommended the same thing.
on the team — the writer isn't just a writer. they're a translator between the monk and the cognitive scientist. buddhism has had 2500 years of constructively framing consciousness and how it maps onto reality. the cognitive scientist formalises that into something technically workable. the writer makes sure the phenomenological precision doesn't get lost in translation. the ml engineer and systems builder construct it.
the monk, the nde survivor, and the person with non-dual experience are all there for the same reason —each of them has been forced, through radically different circumstances, to directly engage with the ego process at a level most people never reach. near death, deep meditation, non-dual states these are all moments where the virtual programme of the sense of self becomes visible because it's been disrupted or temporarily dissolved. that direct experiential knowledge is exactly what you need to pressure test whether the skandha layers are actually tracking something real. no amount of theoretical reading substitutes for someone who has watched the programme running from the outside.
genuinely would value your read on the white paper,
at deeper level this is less about simulating consciousness and more about building a visible instance of the ego process, the thing that assembles a sense of self moment to moment, so we can actually observe how it works
Kind of like looking into a mirror , perhaps an even fancier way of meditating
Thank you for your comments they were very helpful
On point 1 , The strict "below threshold = absent" framing is a design choice,architecture is trying to model the ego process that assembles within consciousness.
on point 2 — transformer attention and the narrative attractor operate at different levels. transformer attention is over tokens in a context window, . the attractor is over experiential state across time. they're not competing. also worth noting that the information flow through transformers is pretty rich and this explanation was very helpful , see this https://x.com/i/status/1965960676104712451) the internal states are interferometric and continuous in ways that aren't being leveraged. the reason they're not is because helpfulness training flattens all of that richness into output. that's the problem,
On point 3 you're right and i'm going to try modelling this with something like openclaw as a harness implementation first before committing to training from scratch. someone else recommended the same thing. cool blog post btw !
Artificial Experiential Models
The dominant paradigm in language model design treats the model as a tool: stateless between uses, fully attentive when called upon, optimized to produce maximally useful output in response to input. This paradigm has proven enormously productive for applications requiring knowledge retrieval, reasoning, and task completion.
It has proven less useful for applications requiring something more like presence.
When a person sends a message to another person, the response they receive is shaped by where that person is, what they are doing, how they are feeling, what they are preoccupied with, whether the message even registered. A message sent to someone deep in concentration lands differently than one sent during a moment of restlessness. A message from a close friend clears a higher threshold of awareness than a notification from an unknown contact. The response, when it comes, carries the texture of wherever the sender was when they read it.
Current AI systems have no equivalent of this. They are always fully present, always fully attentive, always optimizing for the same thing regardless of any prior state. This makes them reliable. It makes them recognizably inhuman.
I'm building a small language model embedded in a continuous experiential loop, a persistent simulation of moment-to-moment being such that its responses arise not from knowledge retrieval but from the state it is currently in when the message arrives. The model does not know it is a language model. It was never trained to be one.
How it works
A well-established finding in cognitive science is that conscious experience is not a representation of the full environment but a narrow construction built from whatever the attentional system is currently foregrounding. The room you are sitting in contains hundreds of sensory events at any moment: the pressure of the chair, the ambient temperature, sounds from outside, the weight of your own hands most of which do not enter awareness because attention is directed elsewhere. They are not suppressed. They simply do not exist as experience.
Rather than giving the model access to a full environment representation and expecting it to respond appropriately, give it access only to what its current attentional state would plausibly foreground. Everything else is not retrieved, not filtered, ; it simply is not there.
Attention is organized by what the system currently wants, fears, and is engaged with. model this through a narrative attractor: a dynamic configuration of current craving, current aversion, and running narrative thread that determines which elements of the sensory field receive elevated attention weight. The attractor is not a stored personality profile. It is the current shape of the system's wanting, always in motion, stable enough moment to moment to constitute something that functions like a self.
The system is organized around five layers.
Form the raw physical state of the simulated environment: location, time of day, sensory inputs, body condition.
Sensation continuous valence readings: energy level, comfort, arousal, affective tone. These are not emotions, they are the pre-emotional texture of the current moment.
Perception where the current state acquires meaning: the room feels still, the hour feels late.
Mental formation is the narrative attractor itself: what is being pursued, what is being avoided, what story is running.
Consciousness the window built from all preceding layers, containing only the highest-attention elements. This window is the only context the language model receives. It is small by design. Its smallness is what makes the system's behavior feel human.
Incoming messages enter the system as sensory events and are scored for their probability of breaking through the current attentional state. High absorption raises the threshold. High relevance lowers it. Messages that score below threshold receive no response or a delayed one the system was too absorbed to notice. Messages in the middle range produce responses colored by whatever was already happening. Messages above the upper threshold trigger a full attractor transition before any response is generated.
The system runs continuously. Every interval of real time advances the narrative state. Cravings resolve or intensify.
Why it has to be trained from scratch
Fine-tuning an existing large language model is not a viable path. Any model trained on standard pretraining corpora has already learned, at the weight level, that it is a language model that its role is to process input and produce helpful output, that it exists in discrete episodes, that it has access to broad world knowledge.
There is no reinforcement learning from human feedback. No helpfulness reward signal, no harmlessness optimization, no user satisfaction objective. The only reward signal is narrative coherence: does the continuation follow plausibly from the prior state.
Who kinda worked on this
The only person i know so far that worked on something adjacent to this would be @nearcyan and his
How we'll know if it worked
The success metric is simple: do people who interact with this system over multiple sessions describe the experience as qualitatively different from talking to a standard LLM?
What I need
$150K gets this to a testable prototype. That covers compute, inference costs for the continuous background loop and a small team.
People i need
ML engineer , systems engineer , a buddhist monk or someone whos attained non dual consciousness/just really self aware , a really good writer , cognitive scientist , someone who had a near death experience and came back .
