The most expensive tester in my company is me
Every few days I walk the staging build on my phone. I tap through the journey the way a user would, and I find things. A heading in the wrong font. A card labelled “latest round” that showed the oldest one. A button that said “view budget tool” and dropped me on the settings page. A pre-register tap that never pinged Slack.
For a while I was quietly proud of this. Founder attention to detail, catching what others miss. Then I noticed the pattern, and it stopped being flattering.
Every one of those bugs reached the most expensive checkpoint in my whole company before anyone caught it: my eyes, on production-adjacent staging, days after the code was written. And none of them needed my judgment to find. A font is either the one I chose or it isn’t. A link either lands where its label says or it doesn’t. The “latest” round is either the newest date or it isn’t. These are not taste calls. They are facts a computer can check in milliseconds, and instead they rode all the way to the far right of the pipeline to be caught by the person whose time is worth the most.
That is exactly backwards. And it has a name.
Shift left
Picture how code travels: you write it, CI lints and tests it, it builds, it deploys, it lands on staging, and eventually it reaches a real user. Left to right. The iron law of quality is that a bug gets more expensive the further right you catch it. A failing assertion at commit time costs ten seconds and nobody ever knows. The same bug caught by a user costs a support ticket, a context switch, a redeploy, and a little bit of trust.
“Shifting left” just means moving each check as far left as it can go, ideally to the point where a machine blocks the bad code automatically, right next to where it was born.
When I map my staging-walk catches onto that line, they are all bunched up on the far right. The work is to walk each one leftward until it is a gate that fails a build, instead of a note I type on my phone.
Graduating a catch into a gate
I had already started doing this without naming it. A while back I asked my build system a simple question: how do you keep the code DRY as it grows? The answer was not “try harder.” It was a check. We added a duplication ratchet to CI, a registry of route constants, and a lint rule that fails the build if someone hardcodes a route string. Copy-paste creep went from “something I hope nobody does” to “something the pipeline will not let you merge.” (I wrote that one up on its own.)
That is the template. Every recurring class of bug I catch by eye should graduate the same way, from “Asif notices on staging” to “the pipeline blocks it automatically.” Not one heroic test suite. A conveyor belt, where each thing that burns me once becomes a gate so it cannot burn me twice.
Here is what that conveyor looks like for the catches above:
- Wrong font, wrong round, dead links, legacy styling buried deep in a page: a post-deploy canary that screenshots the key routes and diffs them, crawls every button and link to confirm it lands where its label promises, and asserts a few data truths (the round shown is the newest one). It runs itself after every deploy and posts the result to Slack. My walk becomes reviewing exceptions, not hunting for them.
- The silent pre-register ping, and the silent report failure that once cost me real signups: a synthetic user that runs signup, pre-register, and report generation end to end against staging and shouts if the Slack ping or the report never shows up.
- Off-brand colours, shouty capitals, the wrong icons: a design lint that fails on anything outside the system.
- The lockdown work protecting the paid dataset: a permission snapshot in CI that fails the build the instant a protected table becomes readable by an anonymous visitor. This one matters most, because that dataset is the product.
The safety net I did not know had a hole
While setting this up I forced a full run of our end-to-end tests, something CI was not doing on every change. Nineteen of them were red. They had gone red quietly, somewhere in the middle of a big redesign, and nobody knew, because nothing was watching them. The founder walking staging had quietly become the only test that ran.
That is the whole argument in one incident. Safety nets do not fail loudly. They fray in silence, and the fraying stays invisible until the thing they were meant to catch lands in your lap. The fix is not more discipline. It is putting something automatic in the path that shouts the moment the net has a hole.
The first gate caught something on day one
I did not have to wait long for the payoff. One of the first gates I built guards the dataset that is the actual product, the historical immigration data. It rebuilds the entire database from its committed migration history and then checks that none of the protected tables or functions are reachable by an anonymous visitor. If a future change ever accidentally opens a door, the build fails before it can merge.
The very first time it ran, it failed. Not because of anything I had just changed, but because it found a function living in the production database that does not exist anywhere in our migration history. Someone — which is to say me, at some point — had created it directly and never wrote it down. The database and the written record of the database had quietly drifted apart, and nothing had ever noticed, because until now nothing had ever looked.
That is the rotted-tests lesson again in a different outfit. Drift does not announce itself. The only reason I know about it today is that I finally built something whose entire job is to look. The gate even proved itself on the way in: I fed it the exact mistake it exists to catch, an accidental public grant on a sensitive function, and it stopped the build cold and named the open door.
A gate that catches a real problem on its first day is not a gate I needed this week. It is a gate I needed the whole time and did not have.
The gate I bypassed
Here is the part that keeps me honest. After building all of that, I bypassed one of my own gates.
I was merging a small docs change, and to save the wait I let a quick throwaway script do the poll-and-merge. It had a bug: it read the list of not-yet-started checks as an empty list, decided “nothing is pending, so everything must be green,” and merged before CI had even begun. The build check then went red. It was only docs, so nothing broke, but the fact stands: I merged red, through a gate I had told myself was there.
Then I looked closer and found the worse thing. Nothing on the platform had stopped me. Our “do not merge red” rule was enforced entirely by good behaviour, by every agent and every script choosing to wait. A gate that depends on everyone choosing to respect it is not a gate. It is a suggestion with a nice name.
The proper fix, GitHub’s own branch protection, costs money on a private repo, and I did not want to pay for a checkbox. So I built the free version: a single merge script that is now the only sanctioned way anything lands, one that waits for every check to actually register and pass and refuses otherwise, and is specifically unable to make the empty-list mistake mine made. The next automated worker that used it could not fumble the way I had.
The lesson doubled back on me. The whole point of shifting left is to stop relying on a human choosing to catch things. It turns out that applies to the human building the gates too. Enforcement has to live in the tooling, not in anyone’s discipline, mine included.
Why this is the founder’s job to insist on, not to perform
I am not trying to remove myself from quality. I am trying to spend my attention on the things only I can judge: whether the product feels right, whether the story lands, whether we are building the thing worth building. Every hour I spend being a human linter is an hour stolen from that.
So the rule I am adopting is simple. The first time a bug reaches my eyes, fine. The second time a bug of the same class reaches my eyes, that is a process failure, and the fix is a gate, not a resolution to look harder.
The cap_drop note saved me an afternoon. Hadn’t thought about the TTY issue during build at all.
Curious whether you stuck with Ollama in the end, or went back to the Copilot model once the 403 cleared up?