Why Claude — and Claude Code — build specs best
We've shipped packs to every frontier coding agent on the market. We keep coming back to Claude. Here's the honest reason — and how Claude Code closes the loop from sealed spec to merged PR.
We get a version of this question almost every week, usually on the second call: “Are you Claude-only? Why?”The honest answer is that we're not religious — the BMAD pack is a portable artefact, and we've handed it to every frontier coding agent on the market at one point or another. But when a customer asks us what they should run on the other side of the MCP socket, we say Claude. And when they ask which harness, we say Claude Code. This post is why.
A spec is a contract. Claude reads contracts well.
The thing a sealed spec asks of a model is unusual. It isn't a creative-writing prompt and it isn't a one-shot code completion. It's closer to legal review: read these eight files, hold them all in mind at once, do not invent clauses, and when something is ambiguous, raise it instead of guessing. That last instruction is the one most models fail on. They are trained to be helpful, and helpfulness in a vacuum looks like making something up.
Claude is the model we've found most willing to stop. It will read the pack, notice the gap, and ask the question — even when nothing about the prompt explicitly invites the question. For a spec-driven workflow, that property is worth more than any benchmark score. A model that ships the wrong feature confidently is worse than a model that pauses.
We don't want a model that's impressive in isolation. We want a model that respects the contract — and tells us when the contract is wrong.
The 1M-context window changed our pack design
We sized the BMAD pack for the long-context era on purpose. Eight planning files plus per-story implementation comes to somewhere between sixty and two hundred thousand tokens for a typical mid-sized product. That used to be an architectural problem — chunking, retrieval, summarisation passes, lossy compression of the very thing we just signed. With Claude's million-token context window, the pack fits. The agent reads the whole contract on every turn. There is no chunker between intent and execution. That is the bug we built SpecGraph to kill, and it is finally dead.
Tool use that feels like engineering, not stage magic
BMAD is not just markdown — it's tools. open_story, cite_section, post_question, update_status. The model has to reach for the right tool at the right moment, with structured arguments that match a typed schema, and recover gracefully when a call fails. Claude's tool-use behaviour is the most disciplined we've seen in production: it picks the right tool, fills the arguments, reads the result, and threads the answer back into the next decision. It does not perform tool use; it uses tools.
Claude Code is the harness, not just the model
Plenty of teams confuse Claude with Claude Code. They are not the same thing. Claude is the model. Claude Code is the terminal-native, repo-aware harness that knows how to run commands, edit files, open PRs, react to test output, and respect a CLAUDE.md as a constitution. For spec-driven work, the harness matters as much as the model — because the harness is what closes the loop between “here's the pack” and “here's the merged commit”.
- It honours
CLAUDE.md. Every BMAD pack ships one. Claude Code reads it first, before the prompt — and treats it as durable instruction, not flavour text. - It cites by default. A sealed spec only works if the agent references PRD sections in its commits. Claude Code does this naturally, with a nudge from the pack.
- It speaks MCP. Our BMAD server hands packs and tools over MCP. Claude Code is the harness that consumes them with the least ceremony and the most fidelity.
- It runs subagents. Architecture review, security pass, test authoring — Claude Code spawns specialised agents that share the pack but isolate their context. This is how we keep big features focused.
- It pauses when it should. Risky actions, irreversible operations, anything not green-lit by the pack — it asks. The harness has the same restraint as the model.
Why SpecGraph defaults to Claude
The decision wasn't made on a benchmark. It was made on a pattern we kept seeing: a sealed pack handed to Claude Code produced PRs that looked like a senior engineer wrote them with the spec open in the next pane. Commit messages cited the right section. Story keys appeared in the right places. Ambiguities came back as questions, not as silently shipped guesses. When something needed a human, the agent stopped and asked. When nothing needed a human, it kept going. That is the loop we want to be repeatable — and Claude Code is where it's repeatable today.
We still keep the pack model-agnostic. Customers running other agents in other harnesses can ingest the same artefact and get most of the way there. But when a team asks us how to get the full SpecGraph experience — sealed spec, MCP-served pack, agent that cites and pauses, PRs that read like papers — the answer is Claude on the model side and Claude Code on the harness side. That isn't marketing. It's the configuration that produces the fewest “wait, that's not what we asked for” moments at review time.
Specs come from people. Packs go to agents. Claude Code is the place where the pack stops being a document and starts being a commit.
How a SpecGraph project flows through Claude Code
For teams who haven't seen the loop end-to-end, here's the version we walk through on the second call. It is not choreography. It is the actual sequence we run on customer engagements.
- Seal. Spec hits
v1.0.0. SpecGraph composes the BMAD pack and exposes it on its MCP endpoint, scoped to a per-project token. - Connect. The team adds the SpecGraph MCP server to their
.claude/settings.json. Claude Code now sees the pack as a first-class resource, alongside the repo. - Constitute. The pack's
CLAUDE.mdis committed at the repo root. Claude Code reads it before every turn and treats it as the law. - Plan. The agent reads
epics.md, picks the next story by key, and pulls the matchingimpl/file. It cites the PRD sections it intends to satisfy. - Build. Claude Code edits, runs tests, opens a PR. Commit messages reference the story key. PR description leads with the citation graph.
- Reconcile. If the agent finds an ambiguity, it calls
post_questionon the MCP server instead of guessing. The question lands in SpecGraph as a wish, ready for the team to resolve. - Close. When the test plan items pass,
update_statusmoves the story to done. The audit trail is complete: pack version, commit, citation, test, sign-off.
What you can do this week
- If you're already running Claude Code, write a one-page
CLAUDE.mdfor your repo today. Tell the agent what not to invent. Half the drift in your next sprint disappears. - Hand your existing brief to Claude and ask it to identify the three biggest ambiguities. The list will be uncomfortable. That is the point.
- If you're evaluating SpecGraph, run the playground pack against Claude Code first. The loop is shortest there. You'll see what “cite-by-default” looks like in practice within an hour.
We'll keep supporting other models — the artefact is the artefact, and portability is non-negotiable. But on the question of which agent reads a contract the way a contract is meant to be read, our answer is settled. It's Claude. And the harness that turns the contract into a commit is Claude Code.
- claude
- claude-code
- agents
- tooling