Code or SDK: When You Actually Need the Agent SDK

We built a multi-agent discovery pipeline on the Agent SDK: four agents, strict tool restrictions, structured audit logging, code-driven orchestration. Then we rebuilt the same capability in Claude Code. Most of the SDK wasn't necessary.

The first pipeline was for a cleantech venture organization. It scans 25+ sources (funding announcements, competition results, regulatory approvals), extracts companies, scores them against investment themes, verifies eligibility, and produces reports. Python code drives the pipeline. The model handles judgment.

Then a second client came along, a company in Western Canada. Same need: scan sources weekly, extract opportunities, score relevance, persist to a database, export reports. Our initial scope called for the SDK. Same architecture, different domain.

Then we read an article about Claude Code transforming finance workflows. The claim was straightforward: define your process as a skill, let Claude execute it, schedule it with cron. No SDK. No custom orchestration code. The same model intelligence, running inside a general-purpose coding tool.

It raised a testable question: how much of what we built for Client A actually required the SDK?

We decided to test it. Client B's pipeline is being built entirely in Claude Code: skills, hooks, MCP servers, the Task tool for parallel agents, claude -p for headless execution, cron for scheduling. The architecture is built. The pilot is in progress.

This is what we've found. Not a definitive verdict. A snapshot from two real builds.

Two Tools, Same Model

Claude Code and the Claude Agent SDK both use the same underlying models: Sonnet, Haiku, Opus. What differs is who drives and what infrastructure you own.

Claude Code is an agentic coding tool available in the terminal, IDE, desktop, and web. Claude reads code, runs commands, writes files, calls external services. You extend it with skills (reusable workflow definitions with frontmatter configuration, dynamic arguments, and scoped hooks), subagents (isolated specialists spawned via the Task tool, with optional frontmatter hooks for per-agent tool control), hooks (lifecycle automation that runs before or after tool calls), and MCP servers (external integrations). In headless mode (claude -p), it runs unattended, schedulable via cron with --allowedTools for whitelisting and --max-budget-usd for cost caps.

The Agent SDK is a programmatic framework for engineers building systems. Your code creates agents, defines tool access, registers hooks, and runs the agent loop. You extend it with custom orchestration code, hook-based validation, structured output schemas, and MCP tool servers. It lives inside your application; you can wrap it with Express, serve a React dashboard, expose REST endpoints.

	Claude Code	Agent SDK
Interface	Terminal / IDE / Web	Programmatic API
Primary user	Developers	Engineers building pipelines
Who drives	Human (terminal) or headless (`claude -p`)	Code
Extends via	Skills, hooks, subagents, MCP servers	Custom orchestration code, hooks, MCP servers
Hooks / validation	PreToolUse (exit code= block) / PostToolUse (logging); per-subagent via frontmatter	Hook-based blocking (denial fed to model)
Output validation	`--json-schema` (pipeline-level)	Constrained decoding (per-agent)
Audit logging	PostToolUse hooks (JSONL or custom)	Structured logging at every stage
Unattended execution	`claude -p` + cron + `--allowedTools`	Native (code-driven loop)
Multi-model support	Per-subagent via frontmatter (`model: haiku`)	Per-agent via code (runtime selection)
Cost model	Subscription ($20–$200/mo) or API pay-per-token	API pay-per-token

The distinction matters because it determines what you maintain, what you pay, and when you need to write custom infrastructure versus using what Claude Code already provides.

The Assumptions We Tested

Before building Client B, we mapped the prevailing assumptions, shaped by our Client A build and common across the ecosystem: production pipelines need the SDK. Claude Code is for interactive development. Anything autonomous, validated, or auditable requires programmatic control.

We tested each assumption against Claude Code's actual capabilities. Here's what held up, and what didn't.

Unattended, Scheduled Execution

The assumption: Pipelines that run on a schedule need the SDK. Claude Code is interactive.

What testing shows: Claude Code's headless mode (claude -p) runs without a human present. Client B's weekly pipeline is orchestrated by a shell script that passes a discovery prompt to claude -p with a tool allowlist and budget cap. Slash command invocation (/discover) is interactive-only, but model-invoked skills still work in print mode; Claude auto-loads them when the task matches their description. A cron job triggers it every Monday morning. The pipeline fetches sources, extracts opportunities, scores them, saves to a database, and logs results, all while nobody's watching.

Takeaway: Headless mode with --allowedTools and --max-budget-usd turns Claude Code into a schedulable pipeline runner. The human-in-the-loop assumption doesn't hold.

Hook-Based Tool Validation

The assumption: Enforcing tool restrictions requires the SDK's hook system. Prompt-based restrictions can be bypassed; code-based restrictions cannot.

What testing shows: Claude Code has its own hook system:lifecycle events including PreToolUse, PostToolUse, SessionStart, SubagentStart, PreCompact, and more. Hooks defined in .claude/settings.json apply project-wide. Hooks defined in subagent frontmatter (.claude/agents/*.md) apply only while that specific subagent is active. This is the intended mechanism for per-agent tool control. A PreToolUse hook receives the tool name and input as JSON on stdin. If it exits with code 0, the tool call proceeds. If it exits with code 2, the call is blocked and the model receives the rejection reason.

Client B uses a PreToolUse hook (validate-lead.js) that enforces geography gating: only leads in the client's target provinces pass through. It also validates field types, score ranges, and required fields. Invalid saves are blocked before they reach the database. This is code-enforced validation, not a prompt suggestion.

Design choice: The hook runs a Node.js script that parses the tool input, checks against an allowlist of geographic patterns, and returns exit codewith a reason string if validation fails. The model retries with corrected input. This is the same pattern the SDK uses, just implemented through Claude Code's hook system instead.

Structured Audit Trails

The assumption: Audit logging requires SDK-level instrumentation. Claude Code sessions are ephemeral.

What testing shows: A PostToolUse hook (audit-save.js) appends a JSON entry to logs/audit.jsonl after every save_lead call: timestamp, tool name, input, result. Each cron run also captures its full output to a timestamped JSON file. The result is a queryable audit trail: every lead saved, every source scanned, every run logged.

Is it as sophisticated as Client A's 11-stage audit tracker with per-source metrics? No. Is it sufficient for a weekly discovery pipeline? So far, yes.

Tool Whitelisting

The assumption: Restricting which tools an agent can access requires SDK agent definitions with explicit allowlists.

What testing shows: The --allowedTools CLI flag auto-approves listed tools without permission prompts. In headless mode (claude -p), this effectively acts as a tool whitelist: unlisted tools can't get human approval, so they can't execute. Client B's cron scripts define per-stage allowlists: the discovery script permits WebFetch, WebSearch, Task, and specific MCP tools. The export script permits Read, Write, and database query tools. Different stages, different permissions, no SDK required.

Parallel Agent Execution

The assumption: Running multiple agents in parallel requires the SDK's orchestration primitives.

What testing shows: Claude Code's Task tool spawns subagents. Client B's discover skill divides sources into batches of four and spawns a source-scanner agent for each batch using multiple Task calls in a single message. The agents run in parallel, each fetching and scoring their assigned sources. Results are aggregated when all agents return. One constraint: subagents can't spawn further subagents, so the parallelism is one level deep.

Budget Control

The assumption: Cost management requires per-token tracking at the API level.

What testing shows: The --max-budget-usd flag caps spending per invocation. Client B's cron scripts set explicit budgets: $8 for discovery, $3 for deep research, $2 for export. If a run approaches the cap, it stops. On a subscription plan, these caps calculate nominal API-equivalent costs as a safety guardrail: subscription usage is tracked against your plan's allocation, not per-token charges. Not as granular as per-agent token tracking, but effective as a cost boundary.

Takeaway: The list of genuinely SDK-exclusive capabilities is shorter than the ecosystem assumes. Claude Code's hooks, headless mode, --allowedTools, and Task tool cover more production ground than is commonly recognized. The caveat: Client B is a pilot. These patterns work at our current scale. We haven't stress-tested them at the volume Client A handles.

Where the SDK Still Wins

Not everything translated. Some capabilities remain genuinely SDK-only, and they matter for specific use cases. The list is shorter than commonly assumed.

Per-Agent Token and Cost Tracking

The SDK tracks token usage per subagent. You know exactly how much the entity extractor cost versus the relevance scorer. This feeds into cost optimization (is the prefilter worth the tokens?) and client billing (what does each pipeline stage cost?).

Claude Code's --max-budget-usd is per-run. You know the total cost of a discover invocation, but not the cost breakdown across the four source-scanner agents it spawned. For pipelines where per-stage cost attribution matters, whether for optimization or compliance reporting, this is a real gap.

Programmatic Retry with Exponential Backoff

With the SDK, your code drives the agent loop, so you can implement custom retry logic with exponential backoff. When a network timeout or rate limit hits, your code waits and retries with increasing delays. Client A's orchestrator does exactly this. You write it once and apply it to every agent.

Claude Code relies on the model's own retry behavior when tool calls fail. The model generally handles transient failures well, but you don't have programmatic control over the retry strategy. For pipelines with strict latency requirements or high failure rates on specific sources, custom retries in your own code provide more predictable behavior. That said, retry strategy alone is rarely the deciding factor between Claude Code and the SDK; it typically matters only when combined with other SDK-only requirements.

Per-Agent Schema Enforcement

The SDK enforces output schemas through constrained decoding: the schema is compiled into a grammar that prevents malformed output from being generated at all. No retries needed. Each agent in the pipeline can have its own schema, so the entity extractor, relevance scorer, and report generator each produce guaranteed-valid output.

Claude Code now supports --json-schema in print mode, which enforces a schema on the final pipeline output via constrained decoding. But it applies at the claude -p invocation level, not per-subagent. Client B handles validation through MCP server-side schemas (Zod for score ranges, field types, enum values) and the PreToolUse hook for additional checks. For pipelines where every agent's output must conform to a formal schema, the SDK's per-agent constrained decoding is more complete.

Application Wrapping

The SDK lives inside a Python or TypeScript application. Client A has an Express API serving entity data, a React dashboard for browsing discoveries, REST endpoints for export generation, and a health check for monitoring. The SDK is one component in a larger application.

Claude Code pipelines can persist data to databases: Client B writes leads to SQLite via MCP, and a separate web app could query that same database to serve a dashboard. But the pipeline and the dashboard are two separate systems. If other services need to call your pipeline via REST endpoints, or if you need real-time pipeline progress in a dashboard, the SDK embeds directly in your application as a single deployment.

Design choice: These gaps aren't universal requirements. Many pipelines don't need per-agent cost tracking. Many don't need application-embedded pipeline logic. But when you do need them, Claude Code can't substitute. The question is whether your specific pipeline hits any of these thresholds, and most don't.

The Economics: Subscription Economics

Here's the argument that shapes the decision more than features do: a Claude subscription covers Claude Code. Every task you accomplish within it (including headless pipeline runs) has zero marginal token cost within your allocation. Note: subscription usage is shared across Claude Code, claude.ai, and Claude Desktop; heavy use in one surface reduces availability in others.

A Claude Pro subscription is $20/month with a rolling usage allocation. Max plans scale to $100 (5x capacity) and $200 (20x capacity).

The Agent SDK runs on the Anthropic API. Every token has a price:

Model	Input (per MTok)	Output (per MTok)
Haiku	$1	$5
Sonnet	$3	$15
Opus	$5	$25

Client A's discovery pipeline costs an estimated $15–60 per run in API tokens. Weekly runs put that at $60–240/month. Client B's pipeline runs within a subscription, estimated at $8 in API-equivalent compute for discovery, $3 for deep research, $2 for export, roughly $13 per weekly cycle (as measured by --max-budget-usd, which estimates costs at API rates). On a Pro plan, that's covered. On a Max plan, it's well within allocation.

$20/month vs. $60–240/month for equivalent work. That's significant.

Both tools support multi-model routing: Claude Code subagents can specify model: haiku in their frontmatter, and the SDK assigns models programmatically. The SDK's economic advantage is narrower than it appears: it comes from per-agent cost tracking (knowing exactly what each stage costs) and access to the Batch API (a general Anthropic API feature offering a 50% discount for non-time-sensitive work, available to any API user but not to subscription-based Claude Code), not from model routing exclusivity.

But there are real limits on the subscription side. Pro usage allocation resets on a 5-hour rolling window, and heavy agentic tasks can exhaust it well before the window resets. Multi-step agentic tasks consume allocation faster than simple chat. The Max plan at $200/month handles heavy workloads, but at that price, you're approaching API cost territory for moderate pipelines.

The honest take: For pipelines that run weekly with moderate source counts, the subscription wins on cost. For pipelines that run daily at high volume where per-agent cost visibility and batch discounts matter, the SDK's economics are better. The crossover point depends on volume and cost attribution needs.

Industry Context

Financial Services

FINRA'sAnnual Regulatory Oversight Report introduced regulatory framing for agentic AI systems in brokerage workflows. Once an AI system can take action (not just generate text), supervisory obligations shift materially.

Key requirements include:

Supervisory frameworks (Rule 3110) and related supervisory obligations that define authorized actions and escalation points
Full-chain telemetry capturing intermediate tool calls and decision pathways
Records retention requirements applicable to AI decision logs

This creates a split that's less binary than it appears.

Claude Code for the developer: Building and maintaining data infrastructure, configuring sources, tuning scoring weights. Interactive, code-adjacent work. Hook-based audit logging via PostToolUse can produce structured JSONL records sufficient for pilot-stage compliance needs.

Agent SDK for the pipeline: When the pipeline needs per-agent cost attribution for billing reconciliation or integration with existing compliance infrastructure via a live API. The structured audit logging (11 pipeline stages tracked in Client A, per-source metrics, verification data) is broader than what we've built in Claude Code hooks so far.

Finance heuristic: If the pipeline needs audit trails and tool validation, Claude Code with hooks may be sufficient. If it needs per-agent cost attribution or integration with compliance APIs, use the SDK.

The EU AI Act reinforces the documentation requirements. By August 2026, high-risk AI systems in the financial sector must comply with requirements including documentation, human oversight, and conformity assessments.

Healthcare

Both tools send data to Anthropic's API for inference. The data-in-transit exposure is the same regardless of tool choice.

Healthcare heuristic: For patient data, use Anthropic's HIPAA-eligible API with a BAA (available to API customers and sales-assisted Enterprise subscribers), or consider on-premises models if cloud solutions don't meet compliance requirements. For literature review and research synthesis, Claude Code for technical analysis, SDK for production pipelines that need schema enforcement or a live API.

Legal

Attorney-client privilege creates unique constraints. A key distinction: on the Anthropic API (used by both the SDK and Claude Code in API mode), Anthropic does not train on customer data. Zero Data Retention (ZDR) agreements are available for organizations requiring that no prompts or completions are stored. Consumer plans (Free, Pro, Max) have different data handling terms. A FebruarySDNY ruling (United States v. Heppner, Judge Rakoff) found that sharing privileged information with consumer-tier Claude waived attorney-client privilege. The court left open whether enterprise tools with contractual confidentiality guarantees (such as ZDR-configured API access) would differ.

Legal heuristic: Interactive analysis requiring professional judgment: Claude Code. Volume batch processing with structured output requirements: SDK. For privilege-sensitive work, use API-based access with ZDR rather than consumer plans. Human review before any output reaches clients, regardless of tool.

Engineering and DevOps

Claude Code's headless mode (claude -p) handles CI/CD integration directly: scheduled runs, tool-restricted execution, budget-capped pipelines. This narrows the SDK's advantage in this domain to per-agent cost tracking and live API requirements.

DevOps heuristic: If it runs in a pipeline with standard output, Claude Code headless. If it needs per-agent cost visibility or serves a live API, use the SDK.

The Decision Framework

After building with both approaches, here's the framework we're using now. It will evolve; the boundary between Claude Code and the SDK keeps shifting as Claude Code gains capabilities.

snippet

Is the workflow fixed and repeatable?
├─ No → Claude Code (skills + subagents, interactive)
└─ Yes → Does it run unattended?
    ├─ No → Claude Code (interactive with hooks)
    └─ Yes → Does it need per-agent cost tracking,
             per-agent schema enforcement, or application-embedded pipeline logic?
        ├─ No → Claude Code headless (claude -p + hooks + cron)
        └─ Yes → Agent SDK

The questions that push you toward the SDK are narrow and specific:

Do you need per-agent cost attribution? For billing, optimization, or compliance reporting? SDK only.
Do you need per-agent schema enforcement? Claude Code supports --json-schema for final pipeline output, but if every agent in the pipeline needs its own schema via constrained decoding, the SDK supports this at the per-agent level.
Do you need the pipeline embedded in a larger application? REST endpoints other services call, real-time pipeline status, a single deployment combining pipeline and dashboard? The SDK lives inside your application. Claude Code pipelines are separate processes that write to databases or files.

Everything else: unattended execution, multi-model routing, tool whitelisting, audit logging, parallel agents, MCP integration, budget caps. Claude Code handles all of it.

A note on scale: If you're hitting subscription throttles at high run frequency, the SDK's per-token billing with Batch API discounts offers more predictable economics. But this is a volume threshold, not an architectural requirement, and Claude Code in API-key mode has the same per-token billing, making it a pricing decision rather than a tool decision.

What We'd Tell Someone Starting Today

Start with Claude Code. Define your workflow as a skill. Connect your data sources via MCP servers. Use PreToolUse hooks for validation. Use PostToolUse hooks for audit logging. Schedule with claude -p and cron. Set budget caps with --max-budget-usd.

This is not a consolation prize. It's a pilot-proven, production-capable architecture at moderate scale. Client B's pipeline (source scanning, opportunity extraction, relevance scoring, database persistence, Excel export) runs entirely this way.

Know the signals that mean you've outgrown it:

You need per-agent cost breakdowns for billing or optimization
You need the pipeline embedded in an application that other systems consume via API
You need per-agent schema enforcement via constrained decoding (beyond --json-schema on the final output)

These are the signals we saw with Client A, and they're why Client A is on the SDK. But Client A is a production system scanning 25+ sources with a React dashboard, REST API, and Fly.io deployment. Most pipelines aren't Client A.

The two-phase approach:

Phase— Build on Claude Code. Validate the domain logic. Prove that the sources yield useful signals, the scoring makes sense, the output is actionable. Client B is in this phase. The architecture is built. The domain logic works. The infrastructure cost is a subscription.

Phase— Migrate to SDK when you hit genuine SDK-only requirements. Not "when the pipeline feels serious enough." Not "when you want to productionize." Only when you need per-agent cost tracking, per-agent schema enforcement, or application-embedded pipeline logic. Client A needed these. Many pipelines won't.

The retrospective from Client A still holds: we built the SDK pipeline from day one. It worked. But we believe we'd have validated faster, and started generating value sooner, by starting with Claude Code. Client B is testing that hypothesis, and so far, it's holding.

The Decision in Two Sentences

Both tools run the same model. The difference is infrastructure.

If a developer should be driving, or if the pipeline needs hooks, MCP, cron, and budget caps, use Claude Code. If the pipeline needs per-agent cost tracking, per-agent schema enforcement, or application-embedded pipeline logic, use the Agent SDK.

Most pipelines start as the first kind. Some graduate to the second. That line is further from the SDK than most teams expect.

For how we enforce guardrails in Agent SDK pipelines, see Why I Don't Let My AI Agents Plan. And for an example of what happens when you take the Claude Code approach to its logical endpoint, see I Turned 21,000 Lines of Code IntoFiles.

References & Notes

Anthropic. "Building effective agents." December 2024. source
Anthropic. "Claude Code documentation." source
Anthropic. "Claude Agent SDK documentation." source
Anthropic. "Claude Code hooks." source
FINRA. "2026 Annual Regulatory Oversight Report." December 2025. source
European Union. "EU AI Act." High-risk AI compliance deadline: August 2, 2026.

Two Tools, Same Model

The Assumptions We Tested

Unattended, Scheduled Execution

Hook-Based Tool Validation

Structured Audit Trails

Tool Whitelisting

Parallel Agent Execution

Budget Control

Where the SDK Still Wins

Per-Agent Token and Cost Tracking

Programmatic Retry with Exponential Backoff

Per-Agent Schema Enforcement

Application Wrapping

The Economics: Subscription Economics

Industry Context

Financial Services

Healthcare

Legal

Engineering and DevOps

The Decision Framework

What We'd Tell Someone Starting Today

The Decision in Two Sentences

References & Notes

Want to scope a build for your team?

The Eval Was Lying to Me

I Turned 21,000 Lines of Code Into 43 Files.

Why I Don’t Let My AI Agents Plan