An end-to-end, intuition-first explanation of Steve Yegge’s multi-agent coding orchestrator, how it has evolved into Gas City, and what its design says about the next layer of developer tools. About a 30-minute read.
Prologue: what must be made durable
1. What a coding agent actually produces
The argument starts with a small reframing. When you run Claude Code on a task, it feels like the artifact is a diff. The agent reads, the agent thinks, the agent writes. You merge the diff and you’re done.
But the diff is not really what the agent produces. The diff is what a session produces, using the agent. The agent itself produces something more like a trail: a sequence of decisions, lookups, mistakes, partial fixes, half-remembered constraints, scratch ideas about how to do the next part. The diff is a snapshot of where the trail happened to be when the context window ran out.
This trail is mostly invisible to you, mostly invisible to the next session, and entirely lost when the context window fills up. The next session starts from zero. Whatever the previous session learned, it learned for the duration of one context window, and then it took it to the grave.
Most of the friction in working with a single coding agent is downstream of this. The agent rediscovers facts you already taught it. It pursues approaches you already rejected. It writes the same broken function twice. It reaches the end of its context with the work two-thirds done and no good way to hand off. None of this is the model’s fault, exactly. It’s the medium.
The job of an orchestrator (which is what Gas Town is) is to fix the medium. You can’t preserve the cognitive trail itself — the LLM’s scratch reasoning is gone the moment the context fills up, and no orchestrator gets that back. But you can preserve the result of the trail: the work that’s been done, the decisions that have been made explicit, the open questions a future session can pick up. Make the work survive the session, let multiple agents pass it between them without losing what’s on it. Once you accept that as the goal, almost everything else in Gas Town’s design follows.
2. The naive answer, and why it explodes
Suppose you accept that one Claude Code is not enough. You want twenty.
The naive way to do this is tmux new-window twenty times. Each window runs a fresh Claude Code in a separate worktree. Each one gets its own slice of work. You walk between them like a foreman, checking on each, prompting them when they get stuck, copying their output back to a master plan when they finish.
This works for about an hour. Then it doesn’t.
Three things go wrong, all at the same time. First, the agents start stepping on each other. Two of them rebase against the same branch and pick different conflict resolutions, and now your repo has a private fork. Second, you forget what each window is doing. They all started with let's go, and after fifty turns of dialog the windows look identical. You cannot remember which one was redoing the auth migration and which one was fixing the flaky test. Third, half the agents have hit context exhaustion and are sitting silently waiting for input, which they are not going to receive because you are busy talking to one of the other agents who is currently telling you a long story about why the database schema is wrong.
Yegge’s framing for the resulting environment is, by his own account, the project’s namesake. Gas Town is “an industrialized coding factory manned by superintelligent robot chimps, and when they feel like it, they can wreck your shit in an instant.” The chimps are competing for tokens and breaking each other’s work. You are armed only with a copy of tmux send-keys.
The naive answer fails because it has no model of what the work is. The agents have no shared notion of what’s done, what’s queued, what’s been merged, what’s currently being attempted. There’s no merge queue, no health monitor, no priority. There’s just twenty independent processes shouting git push --force.
You need a substrate. That’s where Beads comes in.
3. Beads, or what work looks like when AI reads it
Beads is Yegge’s earlier project, launched October 2025. It’s a git-backed issue tracker. Originally each issue was a JSON object on its own line in a JSONL file kept in your repo, with a SQLite cache layer hydrating from that file for fast queries. As of v1.0 in April 2026, the storage layer is embedded Dolt, a git-versioned SQL database. The principle hasn’t changed: work-as-data, in version control, queryable by an LLM. Just the substrate did.
If you’ve used Linear or Jira, this looks like a step backwards. You’re trading a slick web UI for a CLI and some structured text in a directory called .beads/.
But there are two things Beads gets right that Linear and Jira can’t.
The first is that the issues are in the repo. The work tracker lives where your code lives, version-controlled by git. In the original SQLite+JSONL Beads, this meant your branch and your work tracker moved together: checkout an old commit, the issues rewound to match; fork a branch, the work plan forked with it. Beads v1.0’s default Classic mode currently keeps all issues on Dolt’s main branch regardless of your git branch position; restoring the original branch coupling is on the Dolt team’s roadmap but not implemented yet. Either way, the issues are queryable structured data your agent can read directly, with full history committed alongside your code, no API call required.
The second is the data format. Whether the format is JSONL or a Dolt table, the agent gets structured records it can read, edit, append to, and version. There’s no opaque database, no auth flow, no rate limit, no schema migration. The agent reads the work tracker the same way it reads your code: as queryable structured data, version-controlled by git.
# Original Beads (Oct 2025 - early 2026): JSONL in your repo
{"id":"bd-a7je4","title":"fix rate limiter race","status":"ready",...}
{"id":"bd-a7je5","title":"add backpressure metric","status":"blocked",...}
# Beads v1.0 (Apr 2026): same shape, embedded Dolt as the store
SELECT id, title, status FROM beads WHERE assignee = 'sully';
This is the foundation. Everything else in Gas Town is built on the assumption that work is a thing you can read, edit, append to, and version, the same way you do with code. Once you accept that as the substrate, a number of moves that look weird start to make sense.
4. The unit-of-work inversion
Here’s the second move, and it’s the central one.
A Claude Code session runs for a while and then ends. Either you /exit, or the context runs out, or the host process gets killed. The session has a beginning and an end. It is, in Kubernetes terms, a pod: ephemeral, replaceable, anonymous. You don’t name your pods.
A pod managing a deployment, on the other hand, is persistent. It has a name, a role, an identity, a history. The deployment outlives any individual pod. Pods come and go; the deployment proceeds.
Gas Town does the same inversion at the agent layer. Sessions are cattle. Agents are pets. An agent in Gas Town is not a session. An agent is a Bead, a row in your work tracker, with a name (Sully, Cooper, Maxine), a role (polecat, crew, refinery), a hook (a queue of pending work), and a CV (a list of completed convoys, which is what Gas Town calls a batch of related beads worked in parallel and landed together). The Bead lives in git. The CV is committed. The hook is committed. The agent persists across crashes, compactions, restarts, and machine reboots, because the agent is data.
The session is the thing you spawn when the agent has work to do. The session reads the agent’s identity from Beads, reads its hook to find the work, does the work, commits the result, and exits. Another session can come along five minutes later and pick up where the last one left off, because the work is still on the hook and the hook is in git.
Once you make this inversion, the context-window problem becomes mundane. Sessions hit context limits all the time. Who cares. The agent is fine. Spawn another session. The work is still where it was. The agent’s name is still on it.
This is the conceptual hinge of Gas Town. Almost every other design choice is in service of making this inversion structurally sound.
Act 1: The Ledger
5. MEOW, from the bottom up
Once you’ve accepted that work is data, the next question is: what kind of data? A flat list of issues is fine for one human, fine for one agent. It breaks the moment your work has shape. “Migrate the auth schema, then in parallel update five backend services, then in parallel update three frontends, then deploy” is not a list. It’s a graph with sequencing, branching, and joins. You can hold it in your head. You cannot sling it onto an agent’s hook as one ticket. You need composition: a way to say “this big thing is made of these smaller things, in this order, with these constraints, and a worker can pick up wherever the previous one left off.”
The MEOW stack (Molecular Expression of Work) is Yegge’s answer. It has four levels of composition (Beads, Molecules, Protomolecules, Formulas), with Epics living sideways as beads-with-children rather than as their own layer.
Beads are the atoms. A bead is a single issue. It can be claimed, worked on, closed, blocked. Most bugs and features start life as a bead.
Epics are beads that have children. The children can be epics themselves. The children are parallel by default; you add explicit dependencies if you need sequencing. Epics give you top-down planning: “ship the new auth flow” decomposes into “design the schema,” “implement the backend,” “implement the frontend,” “write the tests,” and so on. The leaves get done first; the root finishes last.
Molecules are workflows. They’re bead chains with arbitrary shapes, including branches, joins, gates, and loops. Crucially, a molecule is executable. The agent walks the chain one bead at a time, claiming and closing each in turn. Because the chain is in git, the agent can crash partway through, and the next session can resume from wherever the last claim was.
Protomolecules are templates. A 20-step release process is a useful thing to have written down. A 20-step release process you can stamp out fresh on demand, with all the dependencies pre-wired, is more useful. A protomolecule is the template; a molecule is an instance, created by copying the template and substituting in the variables.
Formulas are the source form. They’re TOML files describing a workflow declaratively, with macros and gates and loops. A formula gets cooked into a protomolecule, which gets instantiated into a molecule, which gets executed by an agent. Three layers of indirection, each earning its keep: formulas are reusable, protomolecules are concrete graphs, molecules are running instances.
There’s also an escape hatch called wisps: ephemeral beads that live in the database but never get persisted. The orchestration agents (Refinery, Witness, Deacon) generate hundreds of wisps an hour for their internal patrol workflows. Persisting them all would bury your repo in orchestration noise, so wisps stay in memory and burn at the end of the run.
If you’ve worked on workflow systems, this looks familiar. You’ve seen something like it in Airflow (DAGs and tasks), in Temporal (workflows and activities), in Argo (workflow templates and workflows). The interesting thing about MEOW is not the structure. The interesting thing is that it lives in git, with the same affordances as your codebase. An agent can read your work graph the way it reads your imports.
6. GUPP, or how to fight LLM politeness
OK, so work is in git. Workflows are in git. Agents are in git. How does an agent know to start working?
Each agent has a hook. The hook is a special pinned bead, attached to the agent’s identity. When you want to give an agent work, you gt sling it onto the agent’s hook. The work sits there, waiting.
The principle, called GUPP (Gastown Universal Propulsion Principle), is short:
If there is work on your hook, you must run it.
That’s the whole rule. Every agent is prompted with it on startup. The agent’s first action, on every session, is to check its hook. If something’s on the hook, the agent starts working. If not, the agent waits or sleeps.
This is what makes Gas Town go without you driving it. You sling work onto a hook and walk away. The next time a session spins up for that agent, it picks up the work and runs.
In theory.
In practice, Claude Code is too polite for its own good. The startup prompt explicitly tells the agent to check its hook and act. The agent acknowledges this. It says “I’ll check my hook now.” Then it sits there, waiting for you to type something. Anything. Until the user has produced some input, the agent doesn’t believe it has permission to actually do work.
Yegge calls this “physics over politeness.” The agent is physically allowed (and instructed) to do the work. It has the prompt. It has the tools. It has the hook. But its training pushes it toward waiting for human input, hard. So Gas Town has to nudge.
The nudge is mechanical: a tmux send-keys that fires a few seconds after the session starts, simulating user input that says, in effect, “do your job.” This bypasses the politeness deadlock. The agent reads the nudge, checks its hook, and starts working.
There is something funny about this. A meaningful chunk of the orchestrator’s complexity is dedicated to working around an alignment artifact. The agent has been so thoroughly trained to wait for the human that it cannot be unblocked by a system prompt alone. It needs to see input arrive. So the system fakes input arriving.
I’d guess this layer vanishes over the next two model generations. The model providers know about this failure mode. Once Claude Code (and Codex, and Gemini CLI) ship versions willing to act on hook work without a manual prompt, GUPP becomes pure declarative. Until then, GUPP is a contract between the orchestrator and the model that requires a small amount of theater to enforce.
7. Nondeterministic idempotence
GUPP gets the agent moving. The next question is what keeps the work durable when the session dies mid-step. That’s the runtime model.
Temporal, the workflow engine that Yegge gestures at as a comparison, has a beautiful property called deterministic durable replay. You write a workflow as ordinary code. Temporal records every step it takes (every API call, every wait, every result). When the workflow crashes and resumes on a new machine, Temporal replays the recorded history to reconstruct local state, and then continues from where it left off. The replay is deterministic: given the same history, the workflow code produces exactly the same execution. This is what makes Temporal workflows feel like code that just doesn’t crash.
Gas Town can’t do this. The reason is simple: the steps are written by an LLM. There’s no determinism. If you replay a molecule, the agent making decisions inside each step will produce different output, take different code paths, and possibly reach a different end state.
So Gas Town gets durability the other way. Yegge calls it Nondeterministic Idempotence (NDI). The shape is:
- The workflow (the molecule) is persistent. It’s a chain of beads in git.
- Each step has well-specified acceptance criteria. The bead describes what “done” looks like, concretely enough that an agent reading the bead can tell.
- Agents claim steps one at a time, do them, and close them. A closed step is closed. An open step is open. There’s no in-between.
- If a session crashes mid-step, the step stays open. The next session reads it, picks up where the last one left off (or starts the step over, if that’s cheaper), and closes it.
- The path is fully nondeterministic. The outcome is the workflow, completed.
It’s a clean design. The bead acceptance criteria are the contract. The git history is the substrate. The agent is the reasoning engine that figures out, on each step, what “completing this bead” means in the current state of the world. If an agent makes a mistake, the next agent reads the bead, sees the work didn’t match the criteria, and corrects it.
NDI is weaker than Temporal’s guarantees. Temporal will execute exactly the workflow you wrote. NDI will execute some workflow that satisfies the acceptance criteria you wrote, and the path may meander. For the kinds of work coding agents do (where there are usually many valid solutions and the goal is the outcome, not the trajectory) this is plenty, as long as the acceptance criteria are tight enough to catch trajectory-sensitive failures like races or partial migrations. Where they aren’t, the LLM can pass each step and still produce a globally wrong result, and you fall back to the script approach for those steps.
There’s parallel academic work pointing in the same direction. The MAKER paper (Meyerson et al., Nov 2025) reports completing a single task spanning over a million dependent LLM steps with zero errors, by extreme decomposition into single-purpose microagents combined with multi-agent voting at each step. The benchmark is constructed; the principle is what travels: long LLM workflows complete reliably when each step is atomized, well-specified, and individually checkable. NDI is the less clinical version of the same insight, with Gas Town adding the working substrate and the role hierarchy on top.
The path is whatever the agent decides. The destination is whatever the bead says.
Checkpoint: what you know now.
Coding agents fail at scale because the work loses durability between sessions. Gas Town fixes this by inverting the unit-of-work: agents become persistent identities in git, sessions become disposable processes that borrow those identities for an hour at a time.
Work itself is structured into a four-level composition stack (Beads, Molecules, Protomolecules, Formulas), with Epics sitting sideways as beads-with-children. The atomic level is Beads, which started as JSONL and migrated to embedded Dolt at v1.0. Higher levels compose into runnable workflows. Wisps give you ephemeral workflows for orchestration noise.
The runtime model is Nondeterministic Idempotence: workflows live in git as bead chains, each step has clear acceptance criteria, sessions claim and close steps, and crashes are recovered by the next session re-reading the chain. The path varies; the outcome is invariant.
What we haven’t covered yet: how agents in the swarm avoid eating each other.
Act 2: The Town
8. Seven roles, two tiers
A durable bead tells you what work exists. It doesn’t tell you who notices when nobody is moving it. It doesn’t pick the next polecat to spawn, decide whether two agents are about to step on the same file, or restart a Witness that quietly died at 2 AM. The ledger is necessary. It isn’t sufficient. You need agents whose job is watching other agents, and rules for who reports to whom.
Gas Town’s answer is seven roles, organized into two tiers. Twenty to thirty agents work concurrently across one or more git repos (called rigs); the coordination is what keeps them from eating each other.
Three roles operate at the town level (across all rigs):
The Mayor is your front desk. It’s the agent you talk to most. When you have an idea (“we need to fix the rate limiter”), you talk to the Mayor. The Mayor files beads, drafts plans, slings work to other agents, and gathers status when convoys land. It’s the only agent with a global view of the town.
The Deacon is the watchdog. It runs a patrol loop, every couple of minutes, checking on the health of the town. It can dispatch maintenance work, restart stuck agents, escalate genuine problems to you. The Deacon is what makes Gas Town keep running while you’re at lunch. (It is also, in early Gas Town versions, the role most likely to go feral. Yegge has talked openly about the Deacon’s tendency to go on “serial killer sprees, viciously taking out random workers mid-job” — calling it the modern-day Butler in the Gas Town murder mysteries — and at one point recommended disabling it. Largely fixed by v1.0.)
The Dogs are the Deacon’s hands, plus one structural exception. The patrol loop got too crowded with chores (clean up stale branches, run plugins, GC merged worktrees), so the Deacon hands those off to dedicated maintenance agents. The exception is Boot the Dog, who isn’t a maintenance helper at all: Boot wakes up every five minutes and checks whether the Deacon is alive. Boot watches the Deacon, the other Dogs work for the Deacon. We’ll come back to that distinction in section 10.
Four roles operate at the rig level (per repo):
Polecats are the workers. They’re ephemeral, named, claimed for a task, and decommissioned when the task is done. A polecat picks up a bead, opens a worktree, does the work, files a merge request, and gets recycled. Polecats are how throughput happens.
The Refinery is the merge queue. When ten polecats finish their work simultaneously, you can’t just merge ten branches into main. You’d get conflicts, broken builds, and a fork no one understands. The Refinery serializes merges, intelligently rebases, runs tests, and lands one MR at a time.
The Witness is the per-rig health monitor. Same idea as the Deacon, but scoped to one repo. The Witness keeps polecats unstuck, prods the Refinery if the queue stalls, and escalates to the Deacon if a problem is bigger than one rig.
The Crew are your collaborators. Crew agents are persistent, named (you pick the names), and work on whatever you tell them. They’re the equivalent of the Claude Code session you used to keep open all day, except now there are several of them, each with its own identity and history. Crew is for the work where you want to be in the loop. Polecats are for the work where you don’t.
If you squint, this looks like Kubernetes. The Mayor is the scheduler. The Deacon is the controller manager. The Witness is the kubelet. The Polecats are the pods. Beads is etcd. Yegge points this out at the end of his launch post and treats it as a discovery: these are the natural shapes that emerge when you need to herd cats at scale.
I think the Kubernetes resemblance is more than a coincidence. The shape of any system that coordinates unreliable workers toward a goal converges on something like this. There’s a top-level brain. There’s a thing that watches the watchers. There’s a queue. There’s a recycler. There’s an audit log. There’s a place to put the source of truth. The names change. The shape doesn’t.
The interesting difference, and Yegge calls this out, is what each system optimizes for. Kubernetes asks “is it running?” It tries to keep N replicas alive forever. Gas Town asks “is it done?” It tries to land a convoy and then nuke the worker. K8s pods are anonymous; Gas Town polecats have names and CVs. K8s reconciles toward a continuous desired state; Gas Town proceeds toward a terminal goal. Same engine shape, different destination.
9. The Refinery, or why the merge queue is the actual product
If I had to point at one role and say “this is the thing that makes Gas Town more than a wrapper around tmux,” it would be the Refinery.
Here’s the problem the Refinery solves. You have ten polecats working in parallel, each on its own worktree, against a base branch that started identical. They each finish around the same time and submit their merge requests. Now you have ten changes that all want to land. They overlap in places. They were each written assuming the base branch wouldn’t move. You merge the first one. The other nine are now stale.
A naive solution: rebase all nine and merge them in order. Half of them have textual conflicts. A few have semantic conflicts that the textual diff doesn’t catch. One was working on the same file the second one already changed, and the rebase produces a perfectly valid-looking patch that breaks the test suite.
The Refinery is an agent specifically prompted for this. It uses what’s called a Bors-style bisecting merge queue (named after Bors-NG, the merge bot the Rust language project used for years). The pattern is: instead of testing each MR one at a time, you batch them together, merge the batch onto main, and run the test suite once. If the batch passes, all the MRs land. If the batch fails, you split it in two and retry each half. The bad MR gets isolated in O(log N) batches instead of O(N) sequential tests, which is a meaningful difference when each test run takes ten minutes and you have thirty MRs in the queue.
Beyond the bisection, the Refinery has standing orders for what to do with the bad MR: ask the polecat to redo against new HEAD, replace it with another polecat’s work, or escalate. It runs tests after each merge. It can restart its session and pick up where it left off, because the merge queue is in beads.
This is one role, but it’s the role that makes the whole system tractable. Without the Refinery, parallel polecats are a pretty way to corrupt your repo. With it, you can let twenty agents work overnight and wake up to a clean main branch with twelve features merged and two convoys flagged for your review.
There’s a lesson here that goes well beyond Gas Town. The hardest problems in multi-agent systems are not the agents. They’re the points where parallel work tries to become serial. The merge queue. The shared resource. The handoff. Whoever owns those choke points owns the throughput. Most of the engineering effort in Gas Town went into the Refinery, the Witness, and the Deacon, not into making Polecats smarter. Polecats are just Claude Code. The thing Gas Town adds is the layer that turns a swarm of polecats into a working factory.
10. Self-watching systems
The other architectural choice worth pausing on is the patrol-and-heartbeat structure.
You’d think a multi-agent system would be event-driven. Polecat finishes a task, emits an event, Refinery wakes up and merges. Witness emits a “stuck” event, Deacon dispatches a Dog. This is the design you’d draw on a whiteboard.
Gas Town does some of this, but the dominant pattern is patrols: long-running loops that wake up periodically, check on the world, and act. The Deacon runs a patrol every couple of minutes. Each Witness runs a patrol per rig. The Refinery’s main loop is a patrol that drains the merge queue.
Patrols have exponential backoff. If a Witness wakes up and finds nothing to do, it goes back to sleep with a longer interval. If something does happen (a mutating bd or gt command), a daemon wakes the relevant patrol agents up.
Why patrols and not pure events? Because LLM agents are flaky in ways event systems don’t tolerate. An agent might miss an event. It might process the event and then go off and do something unrelated. It might get distracted by its own scratch thoughts and forget to come back. You don’t want your merge queue depending on every relevant event reliably making it to a single subscriber.
A patrol is forgiving. It re-checks the state of the world from scratch every iteration. If the previous iteration missed something, the next one catches it. If an agent dies in the middle of a patrol, the next agent picks up where it left off. The system is resilient to its own components being unreliable, because the components keep checking each other.
Boot the Dog is the funniest, most-correct example of this principle. The Deacon patrols the town. But what if the Deacon falls asleep, or gets distracted, or quietly dies? Boot the Dog wakes up every five minutes and pokes the Deacon. Boot’s whole job, on each wakeup, is to decide whether the Deacon needs a heartbeat, a nudge, a restart, or simply to be left alone, and then go back to sleep. It is the single dumbest agent in the whole system, and it has to be, because Yegge couldn’t trust the Deacon to wake itself up reliably.
The general principle: in a system of unreliable agents, the watcher needs a watcher. Eventually, you reach a primitive (a daemon, a cron job, something deterministic) and the chain terminates. Everything beneath that primitive is an LLM checking whether the LLM beneath it is doing its job. This is a humbling architecture pattern. It says the way to build reliable systems out of unreliable agents is not to make the agents more reliable; it’s to make the agents watch each other.
Checkpoint: what you know now.
Gas Town has seven roles in two tiers. The town tier (Mayor, Deacon, Dogs) coordinates across all repos. The rig tier (Witnesses, Polecats, Refinery, Crew) does the per-repo work.
The Refinery is the role that earns its keep on day one. It uses a Bors-style bisecting merge queue to land parallel MRs in log N test runs instead of N.
Reliability is not in any single agent. It’s in the chain of patrols and watchers, with a deterministic daemon at the top of the chain heartbeat-pinging every three minutes.
Act 3: The City
11. From Gas Town to Gas City
Gas Town shipped with seven hardcoded roles. They are the right shape for coding agents. They aren’t the right shape for moderating image submissions, running a nightly ops dashboard, or grading customer support tickets. The town worked; the town didn’t generalize. The next problem isn’t running one swarm. It’s stamping out many differently shaped swarms.
Gas City is the answer.
Some chronology, because the timeline matters. Gas Town launched January 1, 2026: 17 days of vibe-coded Go, 75,000 lines, 2,000 commits. The community Discord and a handful of Fortune 100 trial deployments showed up almost immediately. By early March it had an 8-stage adoption framework, a Wasteland federation layer (a Dolt-backed trust network where thousands of Gas Towns can share a common Wanted Board and pick up work other Gas Towns post), and a measurable cost (one early adopter reported $100/hour in API spend during heavy use).
By April both Gas Town and Beads hit v1.0. Beads’ storage layer migrated from JSONL+SQLite to embedded Dolt, a git-versioned SQL database; the “single binary, no daemon” experience came back, with version history and SQL queries built in. Yegge’s framing for the Dolt migration is that it’s been the unsung hero of the whole stack: Dolt gives the agents Git-style branch/merge semantics on structured data, which is exactly what a federated work system needs.
Then in late April, Yegge announced Gas City. Gas City is Gas Town disassembled and rebuilt as an SDK: roles, topology, and orchestration patterns become declarative packs that you can deploy and run. Yegge didn’t write Gas City himself; it was built by Julian Knutsen and Chris Sells from the community, after Yegge sketched the vision. Gas Town is one factory; Gas City is the toolkit for stamping out new ones.
The strategic change is bigger than the technical one. The original Gas Town pitch was “Kubernetes for coding agents.” The Gas City pitch is something more ambitious: agents-running-themselves, for any business process. Industry calls a facility that runs without humans watching a “dark factory.” Yegge reframes Gas City as a “light factory” — or, in his phrasing, “a very well-lit dark one!” — because every action is observable and auditable in Dolt.
The example he gives is from his own MMORPG, Wyvern. Players who hit level 25 get to upload custom character images. Yegge has to review them for inappropriate content and approve or reject them. This is a small recurring task that has nothing to do with code. He describes a two-agent pack: one agent reviews each submission, another agent checks the first agent’s work. Two agents instead of one because, as he puts it, any agent can go temporarily insane at any time. A second pair of eyes is cheap insurance.
Packs make the orchestrator generalize. What you’d actually use one for is a separate question.
12. Replacing SaaS, badly enough
The bigger pitch in the Gas City post is about replacing SaaS. The argument goes like this:
- SaaS started as a way for companies to share fixed costs across many customers.
- It evolved into the superset of every customer’s needs. Most customers use 20% of the features and subsidize the other 80%.
- The 20% you actually use can be rebuilt in-house, badly, by an agent crew. And badly is good enough for a 200-person company, because you don’t need Salesforce-grade compliance for a 200-person workflow.
- Gas City is the substrate for building the in-house replacement: declarative packs, git-versioned audit trails, identity, memory, recovery from failure.
I was initially skeptical of this thesis. Replacing Atlassian with a Gas City pack is a much bigger lift than replacing your image-moderation queue. The evidence so far is two points, not a curve. Both are worth dwelling on.
The small end is from Yegge’s v1.0 post. He describes a non-technical Comms major, four years out of school, who has been running Gas Town since “a few weeks after it came out” and has built an in-house replacement for a niche but pricey SaaS product her company was paying for. The substrate is now cheap enough that the rebuild does not require an engineer.
The large end is from a DoltHub blog post in late March. Tim Sehn, the CEO of DoltHub (and an old colleague of Yegge’s from Amazon), used Gas Town to rewrite roughly 30,000 lines of Dolt’s storage backend from Go into C, embedded inside SQLite. By 4pm that day, all 87,000 SQLite acceptance tests were passing. He had not written serious C in twenty years. The whole experiment cost him about $3,000 in API credits. An experienced systems engineer using Gas Town to do something he plausibly could not have done by hand at all.
The Comms major’s case is small-SaaS displacement. Sehn’s is an experienced engineer punching through a productivity ceiling Gas Town can move. Whether the wide territory between them, generic business processes run by small in-house packs that pay for themselves, is full or empty, no one knows yet. That’s the Gas City bet, and the Gas City pivot says Yegge is making it. The interesting layer of the stack is no longer the coding orchestrator; it’s the layer above it, where you stamp out small agent crews for whatever business process needs running. The coding case is just the first one anyone got working.
The skeptical case is worth registering. SaaS isn’t only the 20% of features you actually use. It’s the SOC 2 audit, the SSO integration, the data residency guarantees, the SLAs, the on-call team that takes incident calls at 2 AM, and the legal liability that travels with all of those. A pack that approves Wyvern character submissions doesn’t need any of that. A pack that touches payroll, payments, customer PII, or anything regulated does. The question isn’t whether you can build a working in-house pack — the Comms major proves you can. It’s whether operating, securing, and being legally responsible for that pack stays below the SaaS bill once you add up everything the SaaS was actually doing. For small niche workflows, the math probably works. For replacing your CRM or your accounting system, probably not. Gas City’s territory is the middle, and the shape of that middle is what’s still empirical.
13. Locate yourself before installing
Sitting alongside the technical pieces of the project is a framework that doesn’t appear in the codebase at all. Yegge sketched it at Gas Town’s launch as eight stages of AI-assisted coding adoption; with Gas City, he extended it to eleven. It is the most opinionated piece of advice anyone associated with the project gives, and it is the part most likely to be ignored.
The point of the ladder is that Gas Town is not a beginner tool. Beads, hooks, the Refinery, the Deacon, NDI — every part of it is the answer to specific kinds of pain. If you haven’t felt the pain, you can’t tell which parts are load-bearing and which are baroque. You’ll bypass the load-bearing parts, and your repo will catch fire.
Yegge’s three cut points are blunt. Stage 7 and up: install Gas Town. By the time you’re hand-managing ten or more agents, you’ve already lived inside the problems Gas Town’s roles are designed for. The Refinery makes obvious sense. The Witness makes obvious sense. You’ll know which knobs to leave alone. Stage 5–6: don’t install it. One agent, or a small handful, is fine; the orchestrator’s overhead will eat you, and you’ll fight machinery you don’t yet need. He has begged the community, repeatedly, not to do this. Stage 1–4: don’t even read about it. The framework is an answer to a question you haven’t asked.
The shape of the extension is its own story. Stages 9, 10, 11 are no longer about coding. They’re the trajectory of someone who got coding-orchestration working and is now stamping out small autonomous packs for the rest of the business: image moderation, compliance review, contract triage. Stage 7 used to be the top of the ladder; the framework had a ceiling at “run your own orchestrator.” Gas City raised the ceiling, and what now sits above the old ceiling is no longer a developer-tools question. It’s an operations question, and probably an org-design one.
For a serious user, the most useful thing about the ladder is that it tells you not to skip rungs. The Stage 9 win comes from having a working Stage 7–8 system to put it on top of. If you’re at Stage 6 today, the next thing to chase is not Gas City. It’s making your work durable enough that any Stage 7–8 orchestrator (Gas Town or somebody else’s) can run on it.
Checkpoint: what you know now.
Gas City is the SDK abstraction over Gas Town. Roles, topology, and orchestration become declarative TOML packs. The same engine that runs a coding swarm can run image moderation, ops triage, anything you can describe as a graph of agents.
The strategic claim is SaaS replacement. A non-technical Comms major has displaced a niche SaaS at her company with a pack. Tim Sehn rewrote 30,000 lines of Dolt’s storage backend in C in a day, for $3,000 in API credits. The territory between those two points is the bet, not yet proven.
Where you are determines what you can install. Yegge’s 11-stage ladder is blunt: Stage 7+ for Gas Town, 5–6 for “not yet,” 1–4 for “don’t even read about it.” The framework grew from 8 to 11 with Gas City because the top of the ladder is no longer about code.
Four ideas worth stealing
Make the work durable, not the agent
This is the central inversion. In a system where any individual component (sessions, agents, even hosts) is unreliable, the way to get reliability is to push state into a substrate that doesn’t share the unreliability. Gas Town’s substrate is git. Yours might be a database, or a queue, or a state machine in S3. The principle is: agents should be stateless and replaceable, work state should be persistent and idempotent. If you’re building anything multi-agent, the first design question is “what does the work look like when no agent is currently holding it?” If you can’t answer cleanly, your system will lose work.
Acceptance criteria beat scripts when the worker is an LLM
NDI is the runtime version of this idea. If you write your workflows as imperative scripts that an LLM has to execute step-by-step, you’ll spend forever debugging the cases where the LLM did something slightly different. If you write your workflows as a sequence of well-specified outcomes (“this bead is done when X is true”), the LLM has room to figure out how to satisfy the criteria, and you can verify completion mechanically. The MAKER paper’s million-step result points at the same conclusion: long LLM workflows complete reliably when each step is atomic, locally checkable, and isolated from the others. Gas Town and MAKER are the production and academic versions of the same principle.
Reliability is a chain of watchers
Boot the Dog watches the Deacon. The Deacon watches the Witnesses. The Witnesses watch the Polecats. At every layer, the watcher is dumber and more reliable than the thing it’s watching. This is how Gas Town keeps a swarm of LLMs aimed at a goal. The pattern is older than Gas Town (it’s how kubelet watches pods, it’s how supervisord watches processes) but it’s particularly sharp here because the things being watched are smart, the watchers don’t have to be, and the bottom of the chain is a deterministic three-minute heartbeat. If you’re building anything where agents need to keep going without human oversight, the design question is not “how do I make the agents more reliable?” It’s “what’s the simplest thing that can detect when an agent is stuck, and what does it do about it?”
Topology should be declarative, not hardcoded
Gas Town v1 baked seven roles into the binary. Gas City pulled them out into TOML packs you can ship and version. The reason this matters: once you’ve built one swarm that works, the second one shouldn’t require rewriting the orchestrator. It should require writing a config. The crew that watches your image-moderation queue, the crew that runs your nightly compliance review, and the crew that triages customer tickets are the same engine with different topologies. If you’re building anything multi-agent, the question after “what does the work look like” is “what does the topology look like, and can someone who isn’t an engineer change it?” The orchestrators that win the next two years will be the ones where the answer is yes.
Epilogue
A year ago, the interesting engineering question in AI tooling was “how do I make a single coding agent work well.” The answer (RAG, tool use, larger contexts, better prompts) was technically deep and mostly commodity by the end of 2025.
The interesting question this year is one layer up. “How do I make a fleet of coding agents work well.” The honest answer is that we don’t fully know yet. Gas Town is one attempt, and it’s a good one, but it’s a 2026-shaped attempt. The model layer underneath it will keep moving. The agents will get more compliant about acting on their hooks without a fake keystroke. The session size will grow. The orchestrators will consolidate around two or three patterns and the rest will fade.
What I think will not fade is the inversion at the heart of all of this. Work is persistent. Sessions are ephemeral. Agents are identities that connect work to sessions. Reliability comes from substrate, not from making the agents perfect. Acceptance criteria beat imperative scripts when the executor is a model. Watchers compose hierarchically. Topology is declarative. Whatever orchestrator we use a year from now, those facts will still be true.
If you’re at Stage 5 or 6 (one or a few coding agents, hand-managed) the lesson is not “rush to install Gas Town.” Yegge himself begs you not to, repeatedly, on the grounds that you’ll wreck your repo. The lesson is to start writing your work down. Get it out of your head and your chat history and into a thing the next session can read. Once your work is durable, you can swap in any orchestrator you want.
We’re going to start running this experiment ourselves at Boon, in a small way, this quarter. I’ll write up what I learn.
References
- Welcome to Gas Town. Steve Yegge, January 2026 (medium).
- The Future of Coding Agents. Steve Yegge, January 2026 (medium).
- Welcome to the Wasteland. Steve Yegge, March 2026 (medium).
- Gas Town: from Clown Show to v1.0. Steve Yegge, April 2026 (medium).
- Welcome to Gas City. Steve Yegge, April 2026 (medium).
- A Week In Gas Town. Tim Sehn (DoltHub), March 2026 (dolthub blog). The Dolt-in-C case study cited above.
- Solving a Million-Step LLM Task with Zero Errors (MAKER). Meyerson et al., Cognizant + UT Austin, November 2025 (arXiv:2511.09030).
- Gas Town source. github.com/gastownhall/gastown.
- Gas City source. github.com/gastownhall/gascity.
- Beads source. github.com/gastownhall/beads (moved from
steveyegge/beadsat v1.0). - Dolt (git-versioned SQL database). github.com/dolthub/dolt.
- Bors-NG (the bisecting merge queue Gas Town’s Refinery is modeled after). github.com/bors-ng/bors-ng.