Subagents Won't Save You From Vibe Coding

The Tool That Didn't Land

I joined CloneForce in October 2025. The team was deep in the work of figuring out how to build production systems with AI — not as autocomplete, but as a real collaborator.

We were pushing hard. Reading everything that came out. Spec-based development. Harness engineering. New patterns showing up every week from teams who'd been at this longer. We tried them. Argued about them. Kept what worked, tossed what didn't. Most of what we ended up with, we built ourselves — taking ideas from the field and shaping them to fit our actual work.

We knew subagents existed. Anthropic had shipped them as a first-class Claude Code feature weeks before I started. We'd seen them in the docs. We'd seen the announcement. But for months, they never really entered our workflow. Claude didn't suggest them. We didn't reach for them. The feature was right there — and it just didn't land.

What I only understood later is why. Subagents need something to do. They need a bounded, independent piece of work with a clear definition of done. When you're moving fast and figuring things out as you go, there's nothing for a subagent to take — no clean slice to hand off. The tool was waiting for preconditions that didn't exist yet.

The preconditions were research and planning. Once we started doing that consistently — real specs, real constraints, real success criteria before writing code — Claude started suggesting subagents. Not because anything had changed in the tool. Because we'd finally built work that had a place for it.

First Contact

The first time I let Claude spawn a subagent, the experience was disorienting in a good way.

I'd given it a research task — something like "figure out how we're currently handling auth across three different services and tell me what's inconsistent." Normally, that would mean Claude reads through a dozen files, summarizes each one, and my conversation fills up with all that intermediate noise. By the time it gave me an answer, the context was already polluted with everything it had looked at along the way.

Instead, Claude dispatched a subagent. I watched it spin up. A few seconds later, I got back a tight, structured summary. No intermediate reads cluttering my conversation. Just the answer.

It felt like I'd hired a contractor. Told them what to look into. Got a report back. Kept my own desk clean.

I was sold before I understood why it worked. That's the trap.

The Dig

Anything that feels magic is usually something I don't yet understand. So I went looking.

I read the Claude Agent SDK docs. I dug into Anthropic's writeups on multi-agent systems. I went through community threads on how people were actually using subagents — what worked, what didn't, where they were getting burned.

Here's what subagents actually are, once you strip away the mystique:

A subagent is a fresh Claude session spawned from your current one. It starts with an empty context. The only thing it sees is the prompt you hand it. It has its own tools (you decide which). It does its work. It returns exactly one message — its final answer. Then it disappears.

A few things that surprised me:

No shared memory. The subagent can't see your conversation history. Anything it needs, you have to hand it explicitly in the prompt.
No nesting. A subagent can't spawn another subagent. It's one layer deep.
Intermediate work is lost. Whatever the subagent reasoned through along the way — the files it looked at, the dead ends it hit — none of that comes back. You get the summary. That's it.
It can absolutely write code. This is where community guides led me wrong for a bit. I'd seen posts saying "subagents are researchers, not coders." Not true. Subagents can edit files, run tests, write components. They can do anything the parent can do, within the tools you give them.

The right mental model isn't "researcher vs coder." It's orchestrator and contractor. The orchestrator (your main agent, or you) holds the plan. The subagent is a contractor who takes a slice of that plan, does it, and hands back the result.

Distributed Context, Not Smaller Context

There's a common take going around: "Subagents don't solve the context window problem."

Technically true. If your main conversation is already 150k tokens deep, spawning a subagent doesn't shrink it. Your parent context is still stuffed.

But that framing misses the real win.

Here's the picture that actually matters:

One fat context. Everything piled into a single window. As it fills, attention degrades. The model starts missing details. Instruction-following gets sloppier. Hallucinations creep in. You've seen it — the model forgets what you asked for three messages ago because there's too much noise.

Distributed context. A small coordinator context holding the plan, plus many small specialist contexts each focused on one slice of the work. Each agent only sees what's relevant to its task. The coordinator stays lean because the details never arrive there.

The parent is the index. Subagents are the chapters.

This is the architectural move professional AI systems are converging on. Agent teams in the SDK, multi-agent orchestration frameworks, hierarchical agents — they all run on this same idea. You don't make the window bigger. You split the work across windows.

The ceiling moves. A 200k-token problem that would overwhelm one context becomes a tree of 20k-token problems that each fit cleanly.

The honest tradeoff: total tokens across all agents usually goes up, not down. You're paying for focus, not for savings. Each subagent reloads shared context like CLAUDE.md. You're duplicating prefixes.

This is where context caching earns its keep. Anthropic's prompt cache keeps the shared prefix warm across subagent spawns. The first subagent pays full freight; the next four ride the cache. Suddenly the architecture is affordable.

Without caching, distributed context is too expensive to use at scale. With it, the pattern becomes the default.

Where It Breaks

So if subagents are this architecturally sound, why do people keep getting burned by them?

Because they think subagents let them skip the planning.

This is the whole trap. You see the nice clean output from a well-scoped research task and you think, "Great — I'll delegate the hard stuff too." You hand a subagent something vague. "Build me the dashboard." "Refactor the auth flow." "Update all the components to use the new design system."

The subagent does its best. It guesses at your intentions. It fills in every gap with its best approximation of what it thinks you meant. And because it has no conversation history, no access to the back-and-forth that would have clarified the spec, it guesses wrong in ways the main agent wouldn't have.

You get garbage. Except now you've scaled the problem. Instead of one confused agent producing one wrong answer you could correct mid-stream, you've got five subagents producing five confidently wrong answers in parallel.

Here's the failure mode that actually kills teams: dispatch three subagents to build three components for the same view without a shared spec. You'll get three beautifully written, mutually incompatible implementations. Different spacing. Different prop patterns. Different assumptions about the data shape. Each one is fine in isolation. Together they're a mess. And now you have to reconcile them all.

That isn't a subagent failure. It's a briefing failure. The subagents did exactly what they were asked — which was nothing coherent.

Serial errors get caught at step two. Parallel errors ship together.

Subagents amplify whatever discipline you bring. A tight plan gets clean parallel execution. A vague plan gets five times the mess.

The Workflow That Actually Works

Once I understood the architecture, the workflow stopped feeling magic and started feeling like engineering.

The pattern that actually works, every time:

Plan in the orchestrator first. This is non-negotiable. Before you dispatch anything, the main agent (or you) does the thinking. What are the components? What types do they share? What conventions do they follow? What does the integration look like? Decide all of this before a single subagent spawns.

Brief tight, with shared context. Each subagent gets a prompt that's self-contained. Not "build ComponentX." It's: a pointer to the shared plan doc, the exact task for this subagent, the exact output contract, and a reference example to match. If the subagent has to guess about conventions, your brief wasn't tight enough.

Verify outputs. Don't trust summaries. When a subagent reports "done," that's what it intended to do. Not necessarily what it did. Read the files. Run the tests. Check the schema. Claude Code's own system prompt warns about this — an agent's summary describes its intent, not its work.

Integrate in the orchestrator. The parent agent wires it all together. Catches drift. Pushes back when two subagents contradict each other. Makes the final call.

This is harness engineering one layer deeper. Same discipline — structured inputs, checkpoints, verification — applied inside the orchestration layer. Same planning muscle applied before delegation.

The Claude SDK gives you the primitives. AgentDefinition for specialized subagent types. Tool restrictions. Parallel dispatch. It hands you the pieces. The discipline is still yours.

The Discipline Scales

The subagent architecture is genuinely powerful. It solves problems that don't fit in one head — yours or the AI's. It lets you push through the context ceiling without degrading quality. It gives you parallelism for free when work is truly independent.

But it rewards the same skill that separates senior engineers from juniors: the ability to break a problem down cleanly before touching code.

Every post I've written on this site comes back to the same point, because the point keeps being right. Context caching, the Ralph loop, harness engineering, Superpowers — they're all variations on one theme. Structured input produces structured output. Vague input produces vague output. AI amplifies whatever you bring.

Subagents are the highest-leverage version of that rule I've seen so far. They scale execution faster than any other primitive. Which means they scale your planning discipline — or your lack of it — faster than anything else.

Pick which one you want to scale.

The vibe coder spawns five subagents at once and gets five times the confusion. The engineer briefs them into precision and ships something coherent. Same tool. Same model. Radically different outcomes.

That's the gap. It's never been the AI.

— Bill John Tran