2026-05-01 5 min

Spec-Driven Development: the SDD → RPI → Harness loop

Why writing the spec before the code isn't ceremony — it's the only way to parallelize AI without rewriting three times.

Anyone who worked in a Japanese factory in the ’80s will recognize the pattern: you don’t start assembling the car without the procedure. The procedure is worked out, debated, simulated, reviewed. By the time it reaches the floor, the build work is mechanical — because the intelligence is in the procedure, not the worker.

The Toyota Production System¹ codified this as a method. Forty years later, the same logic is reappearing in AI-driven software engineering. And the joke is that we were resisting it because “specs are ceremony.” Spoiler: the spec is the one thing that makes the rest cheap.

This post describes the loop I’ve been running — I call it SDD → RPI → Harness — and why it makes parallel AI work.

SDD: the spec as canonical document

Spec-Driven Development is simple in idea: you don’t ask the AI agent to write code. You ask it to write the spec for the code — what the code will do, the input, the output, the edge cases, the architecture decisions, and why.

The spec is text. Markdown. Versioned in Git, in the same repo as the code. Reviewed by humans (or by another AI agent, in critic mode). Once approved, it becomes the canonical document for everything that follows.

The spec is not a vague requirement like “implement authentication.” It’s something like:

# Spec: auth/email-magic-link

## Goal
Allow passwordless login via a single-use link sent by email.

## Behavior
1. POST /auth/request-link {email} → generates token, sends email.
2. GET /auth/verify?token=X → validates, creates session, redirects.
3. Token expires in 15min, single-use, idempotent on GET (clicking 2x doesn't log in 2x).

## Constraints
- Token generated via Web Crypto, 32 random bytes, base64url.
- Storage: Redis with a 15min TTL, key = sha256(token).
- Email via Resend; template in src/emails/magic-link.tsx.

## Out of scope
- Password recovery (there's no password).
- 2FA (separate spec).

## Acceptance criteria
- [ ] Link delivered in < 30s for 99% of cases.
- [ ] Clicking an expired link returns 410 Gone.
- [ ] The same email can have at most 3 active links.

You can read that and predict what the code will do. More than that: the agent that’s going to implement can read it and produce code that does exactly that, with no guessing.

RPI: the reverse plan

Reverse Plan Implementation is the intermediate step between the spec and the code. The agent that’s going to implement — different from the one that wrote the spec — reads the spec and produces an implementation plan. A list of bite-sized tasks, each with:

Files to create or modify.
Enough code to reproduce the task without reading the rest.
A specific acceptance criterion (usually a test).

The plan is reviewed before any code is written. If the plan is wrong, it’s cheap to fix. If the plan is right, executing is mechanical.

The important inversion here: the plan is not built by the same head that wrote the spec. It’s a different agent. That breaks two things:

Author’s bias — the agent that wrote the spec knows too much about its own decisions. It won’t catch ambiguities. A fresh reader does.
Context drift — if the same agent writes spec and plan, the plan tends to “fill the spec’s holes” without recording it. A new plan exposes what the spec didn’t say.

Harness: the validation mesh

Harness is the piece that makes the loop safe. It’s the set of tests — automated, but also quality checks, lint, type check, build — that has to pass before a change is considered done.

In SDD, the harness is written alongside the spec. The spec’s acceptance criteria become, mechanically, the harness’s tests. When the implementation agent finishes, the harness runs. If it passes, the task is done. If it fails, the agent sees the failure and iterates.

The crucial point: the harness is the objective arbiter. It’s not the agent deciding whether it’s right. It’s an external, deterministic system that says yes or no.

Without a harness, AI is theater: the agent says “I implemented it and it works” and you believe it. With a harness, AI is engineering: the agent says “implemented,” the harness says “passes,” and you trust the harness.

Why this loop parallelizes

The reason I went after this structure is practical: I wanted to run 5 agents in parallel without each stepping on the others. In serial AI (one agent doing one thing at a time), you can get away with improvising. In parallel AI, the absence of a spec is fatal.

Each agent, in parallel, reads its spec. Implements. The harness validates. The harness is the only thing that gives the verdict. There’s no human review blocking the pipeline — the human review happened during the spec review. The code is just the mechanical translation.

This isn’t sci-fi. It’s how Ford ran the assembly line in 1913. The genius wasn’t in the worker, it was in the engineering of the line.

Where Anthropic sits in this story

The Claude Code subagents docs describe a version of this pipeline — specialized agents (planner, implementer, reviewer) that cooperate in a loop. Anthropic’s writing on multi-agent systems shows results in real codebases. Cognition has published its own technical notes on the architecture of Devin with similar logic.

It all points the same direction: AI that’s good at writing code isn’t AI that writes code well. It’s AI that understands the spec. The rest is mechanics.

For anyone who wants to start

Three practical steps that work even on a small project:

Before asking for code, ask for the spec. In Markdown. Short. With an acceptance-criteria section.
Before implementing, ask for the plan. From a different agent, in a different session. Or from yourself, reading the spec with a reader’s eyes.
Before accepting, run the harness. Even if it’s just npm test && npm run build. The point is having an objective arbiter.

It works on a one-person project. It works on a 50-person one. The only difference is the number of simultaneous specs.

The Toyota Production System was codified by Taiichi Ohno across a series of internal manuals, later translated by James Womack et al. in “The Machine That Changed the World” (1990). The core idea — that the procedure carries the intelligence, not the operation — is what we’re applying again here, 35 years late. ↩