Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

qingisbuilding

23
Posts
1
Topics
A member registered 11 days ago

Creator of

Recent community posts

cool, sounds like you are already thinking about and have answers to the questions I am curious about! 

> Each GMS layer maps directly to a failure mode observed in the research

this is the part I was intending to double click on! do you have some writing on this somewhere? sometikes an architecture goes from overwrought to elegant only once you take into account the problems you're trying to solve, and at other times some more elegant solution is possible to the same problems. 

and it's great that you've thought about scaling path. do you have some validation with your ideal early adopter profile? if not that seems like important feedback to get early (I'm not connected at all with block chain folks so I don't really know what they think is valuable.) 

I'm not an expert in this domain but I'll pass to someone I know who is to see if they would be willing to give some feedback. 

as someone with only a passing interest, my main thoughts are:

- "biological neuro basis for novel network architecture" is an enormously oversubscribed field with relatively few good results - you lampshade this a bit but i think the pitch audience will be looking for concrete evidence your thing is exceptional because the baseline-crank-rate is so high. ("aeroplanes are great, but they'll fly even better if the wings flap!") 

- I'd love to hear more about your snake methodology. with a lot of skepticism to overcome (both about your specific domain and also the general state of ML research experimental techniques) you're battling priors of "did they make a mistake with the experiment" 

- the paper you linked to seems like a concrete and modest improvement to a specific technique. the research gap to get from there to Atari games seems pretty huge but I don't have info on where you are with your other puzzle pieces - being able to visually parse the games is useful but also the least interesting piece compared to the actual RL when it comes to success at the Atari stuff

- why snake, and why Atari? why are these useful benchmarks as your milestones towards your field-changing ambitions? AFAIK none of the original deepmind Atari techniques turned out to be of long term importance, so why is it a good proxy for what you are trying to achieve? 

thanks for sharing! my read from Marvin's output is that is a fairly similar ratio of good insight to spurious pattern matching to incorrect leaps as I get from using standard claude to help me brainstorm. is the idea of the value Marvin mainly in the way he's applied (which definitely has value potential), or also in specific harness asks you are giving him that give stronger results than just individuals using claude? I can see that you've asked him to make links between different projects, which is interesting but also going to be valuable to figure out how to eval the effectiveness of the link-finding 

good questions! FYI we will have several form factors but currently the plan for the alpha is that it's a bolt on for claude code (slash commands, MCP, etc) with a small extra token spend via API (byo key) to run some non-claude-able extra function. 

I'm a bit out of my depth domain wise with this one, so I will just flag the questions that jump out at me as an outsider is the record label thing seems out of place - I'm not sure how it fits in - both plan-wise and priority wise. Overall the "4 arms" part of the pitch risks coming across as unfocused and distracting from the really concrete priorities that seem actually very well thought out?

but of what I can understand I think the pitch here is well presented, and the concrete parts do seem really promising, so I hope to see some cool tech and results come out of this!

I think the high level idea is cool but this sentence jumps out at me: "Today, most people turn to Fox News, CNN, X, Reddit, or Discord to get your information about the world. Isolated feeds, algorithm delivered, extremely biased, and not very information dense." - one of these things is not like the others? A discord server is a self-organising ingroup community, exactly the kind of atomic unit that seems to be at the bottom of our hierarchy. It's not like an algorithmic feed. And people evidently like this and it's a popular form of organisation, and there is a fledgling network of inter-server connections that begin to maybe be the thing that Virts is trying to achieve. I think a key angle of the pitch could be, "how is Virts different to a Discord server/community slack/group chat? what would make an ingroup that currently uses one of these switch over to Virts?" - spelling out this difference explicitly would help paint a more vivid picture of the vision.

This is super cool and I very much Want This Thing - so a sign that your pitch is clear! I like the section at the top of vision where you spell out an example user flow, that does a lot to generate the feeling of "yes!".

It's clearly both crazy ambitious and also eminently decomposable - it's clearly possible to prioritise out an MVP as long as you can make some progress on that core semantic composition piece of technology. You do a little bit of science/research and then suddenly unlock a lot of possible engineering. Seems like great timing to try to start this with a small team.

If there's anything else I'd like to see in this pitch it's a competitive analysis - who else is doing things in space? or if there is nobody, what's the nearest thing that might hit this value vector?

Overall very excited by your products! if the core tech works, I will buy them :)

I really like how evocative this part is "Imagine Pokemon Go, except you go around down to see what thoughts you can find. A chatlog for life which you can use to leave notes anywhere in the world, but specifically for others in the same time and space." - definitely the best part of your pitch.

I do wonder if you'll have a density problem here - what's your expected user scale, and the expected concentration of users, to make this a fun experience? but this seems solveable if you focus launch on particular cities first, potentially. 

But definitely a fun project to think about and I can see it creating something valuable for people!

This is a noble aim but I'm a bit put off by the pitch's focus, in its "why you" section, on _work_ rather than _results_. Anyone can put in many years, many words, many publications and fail to make any progress towards their stated goals. And someone who so heavily emphasises proof of *effort* rather than proof of *outcome* makes me concerned that the author is not aware of the fact that effort and results are often not correlated - that they think that this emphasis is justified itself causes me to update away from believing in their work. So I think this is an angle of the pitch that needs reworking to have people consider your idea more seriously.

I can see that the problem you're pointing at is "a real thing". 

As I see it I have three main disagreements with the ideas laid out in your pitch:

1) The attention model of the mind. Based on my readings of writings by experience meditators, it's not the case that "backgrounded" content is *absent* from perception. There's a meaningful sense that "background" perception and "foreground" perception are computed differently, but it's not the case that "background" perception is equivalent to being blanked out.

2) The proposed model architecture. Transformers already have an "attention" mechanism - the attention lives in the weights of the model - the vector *is* the pattern of attention. You haven't meaningfully convinced me that the additional "environment" input would be updated as effectively as the neural network mechanism updates the attention weight, and from the perspective of the inner model it seems likely to be perceived as an unwelcome sidecar that it would learn to ignore rather than something that meaningfully feels like "a part of itself".

3) The claim that we can't do this with current LLM models. This seems the most critical issue to investigate - "a new model architecture" is *so often* what people think the answer is to whatever problem they're trying to solve and *so rarely* actually the answer. What harness layer approaches have you attempted to solve this problem? Whether that's by context engineering, prompt/template engineering or multi-pass LLM call approaches, I could see ways each of these apply to solve the problem of changing "experience". I wrote a blog post last week about how there's a lot of gain to be made in harness layer tech in 2026. https://x.com/YanqingCheng/status/2030067318139801869

I think the problem as you set out is worth attempting to solve, but I don't think the solution proposed here is likely to work, or cost effective compared to other potential approaches.

This is a strong pitch for how succinct it is. I can understand what you're trying to achieve. If it really works out it would clearly be a >>billion dollar business. 

The pitch as it stands puts the algorithm itself as a bit of a black box, and the algorithm is clearly The Thing here - and to inspire more confidence I think it would be good to talk more about "how you know it does what you claim" or "why you have the expertise to make this work" or something in that area. as it stands there's very little in your pitch that a crank who wants to claim they are doing something but doesn't actually have something real couldn't also say, if you see what I mean - you need evidence to make potential investors and partners bayesian update. if you don't have strong evidence yet, "what evals were planning to run" could also work for this - a plan to get the evidence. 

hey sunrise, thank you so much for the feedback! please can you drop me an email at yanqing@tollens.ai about signing up for the alpha so I can get you onto our email lists? 

I'm not the intended audience for this thing since I'm not an astrologer at all, but I appreciate that the ask is straightforward and concrete - I can see that the thing you are trying to make could potentially be well received by it's target audience, even if that audience is not me! 

That may be an assumption worth validating, actually - a way to make this pitch more compelling to outsiders would be some evidence that you have or are planning to do "market research" of your target audience to demonstrate that there are other astrological researchers who would be interested in subscribing to such a publication. 

But I can see that this is clearly a passion project for you with a concrete output that is of learning value (potentially not even just to astrologers, since astronomical maths and science is such a fascinating topic) so I think this is the starting point for something concrete with real value to some people. 

This is a cool pitch! It rhymes with something I've been thinking about for Tollens - that a software project needs some kind of record for its historical quality decision making - "why did we do this and not that", version control for "why" not "what". 

I didn't dig into the full technical details of what you are proposing, but I would love to see about more explanation of that "why" thinking here - since you are presenting architecture it is always useful for the audience if you include "why" in support of your "what"! 

One other question that comes to mind as I read your pitch is "what organisations should use this, and what organisations should not?" - it seems at first glance a potentially quite heavyweight mechanism and I can imagine many kind of organisations where this level of administrative overhead is potentially not worth the value that a rigorous audit trail system could bring

I think your objective is worthwhile and I like that you have clearly described your top down approach. I do wonder about how you will avoid common failure modes of community-based projects and would love to see evidence that you have researched this and considered common problems as part of your pitch. in my experience both as group members and as observers of groups, this kind of group with unclear power structures end up becoming attractors for: 1) bad actors/manipulators/"cult leaders" who consciously or unconsciously act to gain power over the group rather than for the goals of the group 2) highly pedantic or ideological people who fracture the group by causing other members to invest more time and effort than they want into petty matters and minor points of disagreement and 3) narcissistic people who thrive on attention and drama. Traditional church communities (and even hobby groups, group houses etc) are often full of these, and as soon as one of these attractors take hold then "sane normal people" begin to leave. I would love to hear about how you intend for your groups to be robust and antifragile to destructive social dynamics such as these, such that your overall organisation could have longevity

I think the goals of this project are really noble and well described. 

Given that you have a live website already, I would love to hear more about your thinking on execution as part of this pitch. To what extent is your current website succeeding at your goals? Have you succeeded in attracting *and retaining* users? (my intuition is that user retention would be a potential big barrier to success - what makes someone come back? so I'd like to see evidence that this is something you're either working on or have evidence it's not a problem) 

Basically, I think the project is cool and I'd like to see as part of your pitch a concrete assessment of where the bottlenecks are because that should shape your next steps, and give the partners you are looking for confidence that they are joining a project with longevity. 

I think the goals of this project are really noble and well described. 

Given that you have a live website already, I would love to hear more about your thinking on execution as part of this pitch. To what extent is your current website succeeding at your goals? Have you succeeded in attracting *and retaining* users? (my intuition is that user retention would be a potential big barrier to success - what makes someone come back? so I'd like to see evidence that this is something you're either working on or have evidence it's not a problem) 

Basically, I think the project is cool and I'd like to see as part of your pitch a concrete assessment of where the bottlenecks are because that should shape your next steps, and give the partners you are looking for confidence that they are joining a project with longevity. 

I think the goals of this project are really noble and well described. 

Given that you have a live website already, I would love to hear more about your thinking on execution as part of this pitch. To what extent is your current website succeeding at your goals? Have you succeeded in attracting *and retaining* users? (my intuition is that user retention would be a potential big barrier to success - what makes someone come back? so I'd like to see evidence that this is something you're either working on or have evidence it's not a problem) 

Basically, I think the project is cool and I'd like to see as part of your pitch a concrete assessment of where the bottlenecks are because that should shape your next steps, and give the partners you are looking for confidence that they are joining a project with longevity. 

hey, just wanted to let you know that you seem to have not enabled scroll down in the html embed

This is a cool pitch! I want this thing to exist

One specific thing I like about is is you list lots of interesting problems in the motivation section, each of which got me intrigued but made me think they were different domains. But together they primed me to be in the right semantic space to receive your proposed idea which was concrete enough for me to visualise. 

As a maybe relevant adjacent thing, are you familiar with the "bullet comments" format popular on Chinese social video sites? this achieves some but not all of the goals that you're aiming for, but it's interesting to see western audiences react to it - lots of people have a visceral "how could anyone watch anything like this" feeling about the UI, even though people familiar with it like it. I think you're going to have this challenge to overcome with any tool that meaningfully changes the phenomenology of engaging with a live experience - bridging from the audience's expectations. 

As I see it there are three disjointed concepts in this pitch and I am not sure how they relate to each other. 

- I find the original research statement on the bottom slide very interesting, so I would have loved to see more about *how* the Marvin AI claims to satisfy the goals of this research 

- Likewise, I would like to understand *how* the crypto project aids in the success/goals of either the Marvin AI or the original research 

Updated based on feedback to add sections: - bolts on to your existing workflow - Tollens vs growing your QA team - who shouldn't use Tollens

(5 edits)

Tollens: the quality layer for agentic coding [WIP]

One-liner: AI coding lets you ship faster than you can understand what you've built. Tollens keeps you in control.

The problem: You're building something amazing, powered by agentic AI to help write your software as fast as you can dream it. Your codebase is growing faster than your team's ability to reason about what it does, where it breaks, and what happens when it fails. Quality is value to people who matter - your users, your team, your stakeholders. Right now many software teams do not have confidence in what quality they are delivering.

What Tollens does: Real quality work is more than just checklists and test coverage. Quality understanding comes from investigation: exploring what your product actually does, questioning assumptions nobody thought to question, spotting risks that don't show up in a pipeline dashboard. Tollens is an AI toolset that does this adversarial thinking. It flags contradictions, surfaces gaps in understanding, notices when metrics are telling a story nobody's reading, and escalates when a human genuinely needs to make the call - from things as small as "is this UX actually slick to a real user?" to "is this bug a reputation risk we can afford to take?".

Tollens bolts on to your existing AI-native workflow and self-discovers your team's quality context by exploring your codebase and asking questions when it's not sure. Tollens is built around a machine-actionable quality schema: a way of encoding what your team actually cares about (priorities, risks, user expectations) so that AI agents and humans alike can reason about quality based on shared understanding.

Tollens and your QA org: We don't think you can take human judgement out of quality, and we don't think AI is close to changing that. Tollens is your quality team's best assistant, not a standalone quality team. Think of it as what Claude Code is doing for Devs, but for QAs. 

What that looks like depends on who's using it. A junior QA paired with Tollens learns judgment faster, because Tollens asks the hard questions and the junior has to go gather the right input from stakeholders. A senior QA paired with Tollens can leverage much higher impact by amplifying their judgment with Tollens's observation and automation. A QA lead paired with Tollens can confidently generate the holistic quality insights to present to their stakeholders, without worrying about their team not being aligned about the holistic quality assumptions of the product.

Who shouldn't use Tollens: Teams that haven't adopted AI coding tools yet. If you're still figuring out how to actually use AI-generated code in your workflow, your bottleneck isn't quality, it's adoption. Entrenched orgs that are going to get eaten by AI-native newcomers before they can adapt aren't our problem to solve. Tollens is for teams already moving fast with AI and feeling the vertigo.

Why we can have impact: The big labs are raising the waterline rapidly: every base model update can handle more software tasks out of the box. But "software" isn't one problem. There are many genuinely hard tasks within software engineering. For example, five-"9"s (99.999% reliable) telecoms infrastructure, writing embedded firmware you can't patch after deployment, building financial systems where a race condition costs real money, or certifying safety-critical software where someone might get hurt. Out-of-the-box models currently aren't capable of getting the right quality approach to these domains. (I wrote more on this in How fast will AI get better at software?)

We come from one of these domains: three co-founders from the Metaswitch diaspora. We've built trusted software for telecom networks in the cloud. We know what quality thinking looks like when "what if this breaks?" means someone can't dial 999. We've built a quality culture that is just as suited for architecting for five "9"s services on top of three "9"s dependencies as for brutally weighing up exactly what isn't needed for an alpha release to two friendly customers.  

The high-difficulty domains are the summit of the quality mountain. Right now, we're on the foothills. Even on straightforward projects, AI-generated code often ships without considering reliability, supportability, maintainability, security... Tollens scales our judgment with AI tooling, so we can help you get this right whatever your domain.

The timing is also perfect for us to build this capability without being a part of a big lab. The value in AI has shifted to the harness layer: orchestrating models, not training them. (The Harness Layer explains our thinking.)

Next 1-3 months: Super ambitious timeline. Closed alpha launching in ~1 month, targeting ~30 early users: vibecoders with solo noncommercial projects. In ~2 months, design partner trials with real startups, seeing how well we stand up to handling real quality debt in real codebases. By June we want to have validated that Tollens can surface genuine quality insights on codebases it's never seen before, with minimal setup.

Long-term vision: Right now AI-generated code has a reputation as "slop". We want to change that. My hope is that Tollens teaches quality thinking to the software world. My vision is a world where people trust AI-engineered software the way they trust Waymos. Success looks like our way of thinking about quality becoming the default for how teams build with AI.

Resources needed: Mainly feedback on this pitch. "Quality" means something specific to us and something generic to most people, and every time I explain Tollens I end up writing an essay. I want to get to where I can say this in 30 seconds on a call and have the other person get it. Also interested in connecting with CTOs or engineering leads at fast-growing startups who are feeling the quality gap as they adopt AI coding tools - I need design partners to figure out if what we're building works! If you know someone, I'd love to talk.