Working notesfrom the bench.

What we shipped, what broke, and what we changed our minds about. Written for operators, not the timeline.

Builders Log · Jun 6, 2026 · 13 min

The Eval Was Lying to Me

I built a research skill the modern way: handed my expertise to an LLM and layered review tools on top. It looked rigorous, until I verified the gold-standard example I was shipping and found it broken. Here's what rebuilding the gate taught me about whether you can trust your own evaluation.

Read article →

Builders Log · Mar 2, 2026 · 10 min

I Turned 21,000 Lines of Code Into 43 Files.

I spent a month building a full-stack application: server, API, 11 pipeline phases, 21,000 lines of code. The thing that shipped was 43 files in a folder smaller than a hero image.

Read article →

Builders Log · Feb 24, 2026 · 14 min

Code or SDK: When You Actually Need the Agent SDK

We built one discovery pipeline on the Agent SDK. Then we built the same thing in Claude Code. Here’s what both approaches actually require — tested across two client engagements.

Read article →

Builders Log · Jan 27, 2026 · 12 min

Why I Don’t Let My AI Agents Plan

When the process is known, fixed workflows beat autonomy. Here are the guardrails we now use.

Read article →

Automation · Jan 27, 2026 · 8 min

The Run-Based Collection Loop: Stop chasing responses by hand.

A repeatable 3-flow system for collecting updates, tracking status, and chasing non-responders automatically.

Read article →

Work With Us

If this is the kind of work you're trying to ship, let's talk.

We run 4–8 week Workflow Builds with operators who want their AI to actually run in production. Bring the messiest workflow you have.

Book a discovery callNo obligation·Clarity either way