Builders Log · Jun 6, 2026 · 13 min
The Eval Was Lying to Me
I built a research skill the modern way: handed my expertise to an LLM and layered review tools on top. It looked rigorous, until I verified the gold-standard example I was shipping and found it broken. Here's what rebuilding the gate taught me about whether you can trust your own evaluation.
Read article →Builders Log · Mar 2, 2026 · 10 min
I Turned 21,000 Lines of Code Into 43 Files.
I spent a month building a full-stack application: server, API, 11 pipeline phases, 21,000 lines of code. The thing that shipped was 43 files in a folder smaller than a hero image.
Read article →Builders Log · Feb 24, 2026 · 14 min
Code or SDK: When You Actually Need the Agent SDK
We built one discovery pipeline on the Agent SDK. Then we built the same thing in Claude Code. Here’s what both approaches actually require — tested across two client engagements.
Read article →Builders Log · Jan 27, 2026 · 12 min
Why I Don’t Let My AI Agents Plan
When the process is known, fixed workflows beat autonomy. Here are the guardrails we now use.
Read article →Automation · Jan 27, 2026 · 8 min
The Run-Based Collection Loop: Stop chasing responses by hand.
A repeatable 3-flow system for collecting updates, tracking status, and chasing non-responders automatically.
Read article →