I can see that the problem you're pointing at is "a real thing".
As I see it I have three main disagreements with the ideas laid out in your pitch:
1) The attention model of the mind. Based on my readings of writings by experience meditators, it's not the case that "backgrounded" content is *absent* from perception. There's a meaningful sense that "background" perception and "foreground" perception are computed differently, but it's not the case that "background" perception is equivalent to being blanked out.
2) The proposed model architecture. Transformers already have an "attention" mechanism - the attention lives in the weights of the model - the vector *is* the pattern of attention. You haven't meaningfully convinced me that the additional "environment" input would be updated as effectively as the neural network mechanism updates the attention weight, and from the perspective of the inner model it seems likely to be perceived as an unwelcome sidecar that it would learn to ignore rather than something that meaningfully feels like "a part of itself".
3) The claim that we can't do this with current LLM models. This seems the most critical issue to investigate - "a new model architecture" is *so often* what people think the answer is to whatever problem they're trying to solve and *so rarely* actually the answer. What harness layer approaches have you attempted to solve this problem? Whether that's by context engineering, prompt/template engineering or multi-pass LLM call approaches, I could see ways each of these apply to solve the problem of changing "experience". I wrote a blog post last week about how there's a lot of gain to be made in harness layer tech in 2026. https://x.com/YanqingCheng/status/2030067318139801869
I think the problem as you set out is worth attempting to solve, but I don't think the solution proposed here is likely to work, or cost effective compared to other potential approaches.