Postmortem

A browser game made in HTML5

Posted June 24, 2026 by Bulent Yusuf

How we got here

Standard practice on a designed game is to write a design document up front. A fat one covering vision, every mechanic, controls and scope, or a lean one-page pitch plus a living wiki that grows with the build.

BIG LIZARD had none of that.

The game began as a reverse-engineering exercise with no brief, and the design accreted through build-and-playtest. This is a legitimate way to make a small arcade game, but it also comes with a failure mode. Things like scope creep, indecision, wasted tangents.

The method below is what was used in place of an up-front document, which also helped keep that failure mode in check.

The working method

The build ran as a tight loop between a human designer and an AI copilot.

Me, I'm the human. I owned the art, the design intent, every gameplay decision, and final judgement on whether something felt right. Sprites and the layout of the playing-fields were mine to tinker with. Audio effects and music were validated by my ears, since an AI cannot hear the output.

The AI owned implementation only, the game logic in Lua. The sound effects and the music were written to an agreed spec and never went beyond it.

Every change followed a fixed four-step cycle. Propose, agree, build, validate. A change was described and argued for first. It was built only once explicitly agreed. It was then validated before being handed back.

The single most important rule was that nothing got built before agreement was confirmed, because in the absence of an up-front design document, that agreement step was the design document, applied one decision at a time.

In hindsight, this probably wasn't the most efficient way to create my first game. Together, we ended up with roughly 160 different builds before we were ready to ship v1.0. But the journey was as fun as it was educational.

The validation harness

BIG LIZARD is dense with interacting systems and a fixed three-failure economy. A mechanic that creates an unfair trap state is a real risk, therefore, and not always visible by eye. So any claims to fairness had to be tested with brute force.

A test harness re-implemented the game's logic outside PICO-8 and soaked it across hundreds of thousands of randomised situations, looking for states the player could be put in with no fair way out. If a build created a trap state and couldn't pass a soak, then it went back to the drawing board.

Separately, a parser was used to check the cart still parsed cleanly and to measure the relative size cost of a change. One hard-won rule governs that size measurement. The external parser systematically over-reports the token count by a fixed margin, so it was trusted only for relative deltas and parse-checking, never for the absolute number.

The absolute token figure was always read from PICO-8's own editor, which is the ground truth.

What went right

Randomisation at the right scope. A late-game boss shuffle originally re-rolled attack types per wave, which produced difficulty spikes I hadn't tested or intended. Moving the randomisation down to the individual turn fixed it. The lesson generalised: randomisation has to operate at the scope the design actually intends, not wherever it's easiest to bolt on.
Simplifying instead of layering. When a flame effect and a hazard collided awkwardly, the instinct was to add more interaction logic to handle the overlap. The chosen fix was the opposite, limit the height of the flame so the conflict could not arise. Reaching for less rather than more was right more often than not, and it was correspondingly kind to the token budget.
Letting playtests decide. A punisher mechanic aimed at players who panic-roll was specced and then dropped, because the thing it was meant to fix turned out to be pre-release jitter rather than a real problem the playtests confirmed. Holding back from building a solution to a problem that didn't exist saved tokens and complexity.
The opposite-half bonus spawn. An early version spawned the bonus ram from a fixed edge every time, which let players pre-position and trivialised it. Spawning it from the opposite half forces a chase and restored the tension. A small rule change with an outsized effect on feel.

What went wrong

Building before agreement. On several occasions, the AI assistant started building before the agreement step was complete. Each time it produced work that had to be thrown away. This is exactly the failure mode the four-step loop existed to prevent, and the fix was discipline, not cleverness.
Reasoning from memory instead of reading the code. Questions about how systems sequence and collide were sometimes answered from recollection of the design rather than from the actual cart, and the answers were sometimes wrong. The rule that came out of it is blunt, read the real code before answering any sequencing or collision question.
Under-counting the token cost. Size estimates were repeatedly too low, partly from trusting the over-reporting parser's number and partly from optimism. The fix was to stop estimating and always measure in the editor.

The lessons, distilled

Trust the tool's own ground-truth measurement, not a proxy that is convenient but biased.

The playtest verdict beats the theory. Don't try to build solutions for imaginary problems.

Randomisation scope must match design intent, or it invents difficulty nobody designed.

Prefer removing an interaction to adding one. Simpler is usually both better and cheaper.

Fairness is a numerical claim and needs numerical proof, not a confident argument.

When touching the scoring, enumerate every point at which the multiplier resets, every time.

Read the actual code before reasoning about behaviour.

Art direction and sprite silhouette are a human domain. Procedural substitutes are a last resort, not a shortcut.

Irreversible decisions, the one-way doors, get flagged and signed off explicitly before they are taken.

Download Big Lizard