Posted June 24, 2026 by Bulent Yusuf
Standard practice on a designed game is to write a design document up front. A fat one covering vision, every mechanic, controls and scope, or a lean one-page pitch plus a living wiki that grows with the build.
BIG LIZARD had none of that.
The game began as a reverse-engineering exercise with no brief, and the design accreted through build-and-playtest. This is a legitimate way to make a small arcade game, but it also comes with a failure mode. Things like scope creep, indecision, wasted tangents.
The method below is what was used in place of an up-front document, which also helped keep that failure mode in check.
The build ran as a tight loop between a human designer and an AI copilot.
Me, I'm the human. I owned the art, the design intent, every gameplay decision, and final judgement on whether something felt right. Sprites and the layout of the playing-fields were mine to tinker with. Audio effects and music were validated by my ears, since an AI cannot hear the output.
The AI owned implementation only, the game logic in Lua. The sound effects and the music were written to an agreed spec and never went beyond it.
Every change followed a fixed four-step cycle. Propose, agree, build, validate. A change was described and argued for first. It was built only once explicitly agreed. It was then validated before being handed back.
The single most important rule was that nothing got built before agreement was confirmed, because in the absence of an up-front design document, that agreement step was the design document, applied one decision at a time.
In hindsight, this probably wasn't the most efficient way to create my first game. Together, we ended up with roughly 160 different builds before we were ready to ship v1.0. But the journey was as fun as it was educational.
BIG LIZARD is dense with interacting systems and a fixed three-failure economy. A mechanic that creates an unfair trap state is a real risk, therefore, and not always visible by eye. So any claims to fairness had to be tested with brute force.
A test harness re-implemented the game's logic outside PICO-8 and soaked it across hundreds of thousands of randomised situations, looking for states the player could be put in with no fair way out. If a build created a trap state and couldn't pass a soak, then it went back to the drawing board.
Separately, a parser was used to check the cart still parsed cleanly and to measure the relative size cost of a change. One hard-won rule governs that size measurement. The external parser systematically over-reports the token count by a fixed margin, so it was trusted only for relative deltas and parse-checking, never for the absolute number.
The absolute token figure was always read from PICO-8's own editor, which is the ground truth.
Trust the tool's own ground-truth measurement, not a proxy that is convenient but biased.
The playtest verdict beats the theory. Don't try to build solutions for imaginary problems.
Randomisation scope must match design intent, or it invents difficulty nobody designed.
Prefer removing an interaction to adding one. Simpler is usually both better and cheaper.
Fairness is a numerical claim and needs numerical proof, not a confident argument.
When touching the scoring, enumerate every point at which the multiplier resets, every time.
Read the actual code before reasoning about behaviour.
Art direction and sprite silhouette are a human domain. Procedural substitutes are a last resort, not a shortcut.
Irreversible decisions, the one-way doors, get flagged and signed off explicitly before they are taken.