I stopped writing the code. I did not stop being the engineer.

The question

Every dev who has seen how I work lately asks some version of the same thing: “wait, so what do you actually do?”

Fair question. In one recent stretch I shipped more than a dozen pull requests to staging — a whole design overhaul across the landing page, onboarding, the dashboard, five journey stages, an occupations funnel, a new tool, and settings. I reviewed all of it. I decided all of it. I wrote almost none of it. A lot of it I steered from my phone, walking around, tapping into a preview and firing back “the stage card looks cluttered” or “that button feels like a no-op.”

So the honest answer to “what do you do” is: I do the deciding, the taste, and the verifying. The typing is delegated. But delegated is not the same as abandoned, and the gap between those two words is the whole job.

The model: one brain, many hands

The mental model I settled on is a central nervous system. There is one place where judgment lives — the main thread I am talking to — and it does not do grunt work. When there is a real chunk of building to do, it spins up a subagent to do it, hands that agent a tight brief, and waits for the result. Several of them run at once when the work is independent, each in its own lane, sometimes each in its own isolated copy of the repo so they cannot collide.

The main thread’s job is not to write files. It is to hold the plan, split the work, and then be the thing that does not trust the workers by default. Every agent comes back with a report that says “done, all green.” That report is a claim, not a fact. The interesting part of the workflow is everything that happens between the claim and me believing it.

Why bother with the layering? Because context is the scarce resource. If the one thread that holds the whole plan also reads every file and runs every command, it drowns. Keep it at altitude, push the execution and the file-reading down to workers, and it can hold a much bigger picture for much longer. It is the same reason a staff engineer does not also do all the typing: not because they cannot, but because their attention is the bottleneck.

Verify, do not infer

The single rule that makes this safe is boring: prove it, do not assume it.

Every change lands on a staging branch behind a password, never straight to production. And “it deployed” is not proof of anything. Proof is going and looking. After a deploy I have the system fetch the actual page and assert the actual thing changed: the fabricated number is gone from the rendered HTML, the anonymous dashboard returns a 200 and not a redirect, the computed font on the body is really the one I intended and not a silent fallback. When I said “the pre-register button does not notify me,” we did not reason about whether the fix was correct. We fired a real pre-register at live staging and then read my Slack channel and watched the message arrive. Verified, not inferred.

This sounds obvious. It is not what most people do. Most “AI wrote it and the tests pass” workflows stop at the claim. The claim is where the bugs hide.

The failures are the point

Here is the part the polished threads leave out. This process goes wrong constantly. What makes it work is not that the AI is right; it is that the wrongness gets caught.

The first attempt at the redesign came back the wrong colour entirely. Teal, when the design was forest green. It had built faithfully from a stale token file instead of the actual rendered design, and it looked confident doing it. I caught it in one glance: “looks nothing like the prototype.” That is a taste judgment. No test catches “wrong vibe.”

Twice, the system cheerfully rendered fabricated data to users — invented queue numbers, a made-up “12,847 already sorted” that traced back to nothing. I caught those on my phone too, walking the staging site, because a number that is too round or too confident sets off a human alarm that a green test suite never will.

And some failures the system caught on itself, which is the good kind. A schema change drifted out of version control and turned into a broken deploy; a continuous-integration gate we had built weeks earlier for exactly this reason went red and blocked it. A whole application had been silently rendering in the browser’s fallback serif for who knows how long, because a font variable was declared in the wrong place; the render-and-check step surfaced it. The lesson is not “the AI is unreliable.” The lesson is that you build a system where breadth and speed come from the machine, and judgment and taste stay with the human, and enforcement catches the overlap between them.

The part I am proudest of: convention versus enforcement

A dev asked me a sharp question mid-build: how do I make sure a new rule I care about actually sticks?

It forced me to admit something. My “ship checklist” is about eleven steps — clear the caches, typecheck, run the full test suite, build, run the end-to-end tests, check for duplication, enforce the writing style, commit, open the pull request and wait for its checks, watch the deploy, then smoke-test the live result. Eleven steps. But when I counted honestly, only five of them are actually enforced by machinery that runs whether I am careful or not. The other six are discipline. They hold because I am paying attention, which means they are exactly as reliable as my attention, which is to say: not a guarantee.

So the real answer to “how do you make a rule stick” is: you do not put it in your routine and call it done. Routine is soft. You graduate it into continuous integration — a check that runs on every change, for everyone, forever, including a future tired version of me. When the dev asked me to make a duplication check standing, I did not just add it to my habit. I wrote the habit down so it survives, and then I opened a ticket to build the real, machine-enforced version, because only that one truly stands.

That reframed the whole workflow for me. The number that matters is not how many steps my checklist has. It is how many of them hold when I am not looking. Every good instinct I have — “never ship fabricated data,” “every schema change is a committed file” — starts life as my discipline and is only really safe once it is a gate that does not need me. The work is the migration from the first to the second.

So, the actual workflow

Stripped down:

I decide what and why. That stays with me, always. The AI never picks the direction.
I write a tight brief and delegate the building, often several pieces in parallel, each isolated.
I treat every “done” as a claim. Nothing is believed until it is proven on real, deployed staging with a concrete assertion.
I stay the taste layer. Wrong colour, cluttered card, a number that smells invented, a button that feels dead — those are mine to catch, and I catch them by actually using the thing, often from my phone.
When a rule matters, I do not trust myself to remember it. I turn it into something a machine enforces.

None of this makes me less of an engineer. If anything it concentrates the engineering into the parts that were always the actual job: what to build, whether it is really done, and how to make good behaviour the default instead of the hope. The typing was never the job. It just used to be where all the time went.

The question

The model: one brain, many hands

Verify, do not infer

The failures are the point

The part I am proudest of: convention versus enforcement

So, the actual workflow

I built a check, and the first thing it caught was me