Contents
1. Prerequisites
This guide assumes you have already completed the Getting Started workflow and have a working GSD installation. You should be comfortable with:
- Running
/gsd:helpand navigating the command menu - Creating milestones with
/gsd:new-milestoneand adding phases - The basic plan-execute-verify loop (
/gsd:plan-phase,/gsd:execute-phase,/gsd:verify-work) - Reading
.planning/STATE.mdand.planning/ROADMAP.mdfor project context
You should also have:
- GSD upstream v1.30.0+ with SDK and multi-runtime support
- Claude Code v2.1.88+ with the latest hook system (PostCompact, FileChanged, PermissionDenied, if-conditions)
- A project with at least one completed milestone so you understand the full lifecycle
The Getting Started guide teaches you the workflow. This guide teaches you the system — the patterns, architecture, and strategies that make autonomous execution possible at production scale.
2. The Vision-to-Mission Pipeline (VTM)
Every significant piece of work in GSD passes through a five-stage pipeline that transforms an idea into a verified deliverable. The pipeline is not metaphorical — it maps directly to GSD commands and file artifacts.
The Pipeline in Practice: BLN Textbook
The Blender User Manual (BLN) demonstrates the full pipeline at scale. The vision was a comprehensive Blender manual — 10 research modules covering all of Blender 4.4, from interface to furry arts production.
Vision: A mission pack defined the scope: 10 chapters, glossary, cross-references, production-quality prose. Success criteria were explicit — 100K+ words, 6 Blender domains, 350+ target pages.
Research: Five research agents ran in parallel, each responsible for 2 modules. They produced 107K words of structured research — technical depth with pedagogical framing — in a single wave.
Planning: The research output fed into chapter plans. Each chapter had a REQUIREMENTS.md and PLAN.md with section-level specifications, glossary terms, and cross-reference targets.
Execution: Six chapter expansion agents ran in parallel. Each received its research module plus the master glossary, ensuring consistency. The output was 329 pages of final content.
Verification: Goal-backward analysis checked every success criterion against the deliverables. Coverage audits confirmed cross-module references resolved correctly. The verification matrix scored 12/12 PASS.
Total: 11 parallel agents, 107K words of research, 329-page PDF, one session.
3. Wave-Based Parallel Execution
GSD decomposes execution into waves — ordered batches of work where parallelism is safe within a wave and sequencing is enforced between waves. This is how the system runs multiple agents without race conditions or context conflicts.
| Wave 0 Haiku |
Foundation. Sequential. Schemas, templates, directory structure, shared constants. Everything downstream depends on these artifacts existing. Haiku is fast and cheap — foundation work is mechanical. |
| Wave 1 Sonnet |
Content production. Parallel. Multiple agents writing simultaneously into non-overlapping file spaces. Each agent gets its own scope (chapter, module, component). Sonnet balances quality with throughput. |
| Wave 2 Opus |
Synthesis. Sequential. Cross-module integration, cross-reference resolution, consistency audits, style normalization. Opus handles the long-context reasoning this requires. |
| Wave 3 Sonnet/Haiku |
Publication. Sequential. Assembly into final output format, verification against success criteria, delivery (commit, release, FTP sync). Mechanical work, lower model tier. |
Real Numbers
The BLN mission used 11 agents across 4 waves with an estimated 761K tokens. The HEL research project used 9 agents producing 28 documents and 91K words. The Seattle 360 engine has shipped 64 autonomous releases so far, each following the same wave structure at smaller scale.
Why Waves Work
Waves solve the fundamental tension between parallelism and correctness. Within a wave, agents operate on disjoint file sets — no merge conflicts, no overwritten work. Between waves, the sequential barrier ensures downstream agents see the complete output of upstream work. The model assignment per wave (Haiku for scaffolding, Sonnet for content, Opus for synthesis) optimizes cost without sacrificing quality where it matters.
4. The Skill System
Skills are the adaptive learning layer. They auto-activate based on context — you never invoke them manually. When you start planning a phase, the gsd-workflow skill loads. When you commit, beautiful-commits activates. When you work on research, research-engine provides the pipeline.
Skill Anatomy
Each skill is a YAML-frontmatter markdown file in .claude/skills/. The frontmatter defines activation:
---
name: research-engine
description: Autonomous research pipeline
triggers:
intents:
- research
- investigate
- deep dive
contexts:
- planning research phases
- producing research documents
applies_to:
- "www/**/*.html"
- "www/**/*.md"
---
The body contains the skill's instructions — the knowledge and patterns the agent should apply when the skill is active. Skills compose naturally: when you run a research phase that produces commits, research-engine + gsd-workflow + beautiful-commits all activate simultaneously.
69 Installed Skills
Skills are organized by domain: GSD workflow management, agent orchestration (Gastown convoy, GUPP, fleet-mission), content pipelines (research-engine, publish-pipeline), session management (session-awareness, context-handoff, beads-state), security (security-hygiene), and development patterns (typescript-patterns, test-generator, code-review). The skill-integration skill manages the lifecycle of all other skills — it observes patterns and can suggest new skills when it detects recurring workflows.
How Skills Compose
The power of the skill system is composition. A typical autonomous research mission activates:
gsd-workflow— phase routing, lifecycle managementresearch-engine— topic-to-document pipeline, HTML/PDF outputpublish-pipeline— Pandoc + XeLaTeX templates, FTP syncbeautiful-commits— conventional commits with semantic structuresession-awareness— cross-session state, crash recoverybeads-state— git-friendly persistence for orchestration state
No configuration required. The skills detect context and load themselves.
5. Agent Orchestration Patterns
The system includes five proven orchestration patterns, each designed for a different topology of work. These are not theoretical — every pattern has been validated at production scale with real deliverables.
Gastown Convoy
Named after the Gastown district model. A mayor dispatches work to polecats (worker agents), a witness monitors progress and detects stalls, and a refinery merges the results. The mayor never does content work — only coordination. Polecats are ephemeral and stateless. The refinery is deterministic.
GUPP
Get Up and Push Protocol. An interrupt controller that converts the default poll-based agent model into proactive execution. Instead of waiting for the user to check status and dispatch next steps, GUPP detects completion events and immediately pushes the next work item. Configurable thresholds per runtime.
DACP
Deterministic Agent Communication Protocol. Every handoff is a three-part bundle: (1) what was accomplished, (2) what remains, (3) critical context the next agent needs. Eliminates the "lost context" problem that plagues multi-agent systems. Bundles are files on disk — crash-recoverable by design.
Sling-Dispatch
An instruction dispatch pipeline that implements a 7-stage cycle: fetch available work, allocate to an agent, prepare the execution context, dispatch, monitor, collect results, update state. Sling handles the queue management so the mayor pattern can focus on coordination decisions.
Fleet Mission
Parallel agent dispatch with progress tracking and result aggregation. Unlike Gastown (heterogeneous work), Fleet Mission launches N agents with the same template but different inputs — like 5 research agents each covering 2 modules. Progress tracking and result merge are built in.
Mayor-Coordinator
A Northbridge coordination pattern where a single mayor creates convoys, dispatches work via sling, and monitors completion across multiple simultaneous workstreams. The mayor maintains the global view while polecats handle local execution. Includes stall detection and nudge-sync for stuck agents.
New Subagent Fields (Claude Code v2.1.88+)
The latest Claude Code release added powerful subagent configuration fields that these patterns leverage:
effort— control reasoning depth per agentmaxTurns— set execution budgets to prevent runaway agentsisolation: worktree— each agent gets its own git worktree, eliminating file conflictsmemory— per-agent persistent memory across turnsskills— explicitly load skills into subagent contextmcpServers— connect agents to MCP tool servers
6. The Hook System
Hooks provide deterministic automation — behaviors that fire reliably every time, independent of the LLM's reasoning. Unlike skills (which influence the LLM's behavior), hooks execute scripts before or after tool calls and cannot be overridden by the model.
Available Hook Points
Example: Commit Validation Hook
A PreToolUse hook on Bash intercepts every git commit command and validates the message against Conventional Commits format before allowing it to execute:
{
"event": "PreToolUse",
"tool": "Bash",
"if_conditions": { "command_contains": "git commit" },
"script": "validate-conventional-commit.sh",
"action": "block_on_failure"
}
This fires every time, regardless of which skill is active or what the LLM's instructions say. The commit either passes validation or it does not. Deterministic.
Hook + Skill Composition
The hook system and skill system work in complementary layers. Skills guide the LLM's reasoning ("use conventional commits, include a scope"). Hooks enforce invariants the LLM must not violate ("this commit message does not match the format — blocked"). The combination produces reliable behavior: the skill makes the model want to do the right thing; the hook ensures it.
7. NASA Systems Engineering Methodology
GSD's planning methodology is adapted from NASA's systems engineering process. The core insight: planning is the hard part; once the plans are done the code is easy. The methodology has three levels, each building on the previous.
Level 1: Requirements
What must be true when this phase is complete? Requirements are concrete, verifiable, and scoped. They live in REQUIREMENTS.md and use the pattern: "The system shall [verb] [object] [constraint]." Requirements are written before any implementation discussion.
Level 2: Plan
How will the requirements be satisfied? Plans decompose into tasks, specify file paths, define wave assignments, and estimate model profiles. The plan is the execution contract — agents follow it literally. Plans live in PLAN.md and reference specific requirements by ID.
Level 3: Execution
The plan is executed by agents following the wave structure. Each task maps to a commit (or small set of commits). Execution is deterministic: given the same plan and the same model, you get substantially the same output.
Pre-Execution Intelligence
The methodology's secret weapon is answering questions before building. The /gsd:discuss-phase command gathers context through adaptive questioning. The /gsd:research-phase command produces research documents. The /gsd:list-phase-assumptions command surfaces hidden assumptions. By the time execution begins, there are no open questions — only tasks.
Verification
Verification checks what was built against what was planned. The /gsd:verify-work command performs goal-backward analysis: start from each requirement, trace forward to find the artifact that satisfies it. Any requirement without a corresponding deliverable is a gap. The 4-level verification ladder (static, artifact, behavioral, human) ensures thoroughness.
8. State Management
GSD is a disk-driven state machine. All state lives in files — no databases, no external services, no hidden memory. This makes the system inspectable, diffable, and crash-recoverable.
The .planning/ Directory
The .planning/ directory is gitignored by design. It contains all mutable project state:
STATE.md— current phase, active workstreams, blockersROADMAP.md— milestone phases with status indicatorsREQUIREMENTS.md— per-phase requirementsconfig.json— project configuration (model profile, toggles)- Phase directories (
phase-01/,phase-02/, ...) with PLAN.md, research, artifacts
Cross-Session Persistence
MEMORY.md provides cross-session persistence for Claude agents. It is structured into HOT (always in context), WARM (loaded at session start), and COLD (referenced on demand) sections. The session-awareness skill manages session recovery, and context-handoff creates structured handoff documents when sessions end.
Beads-State
For multi-agent orchestration, the beads-state system provides git-friendly, crash-recoverable state persistence. Agent identities, work assignments, progress counters, and completion status are all stored as flat files that survive process crashes and context compactions. The PostCompact hook triggers state recovery automatically.
Crash Recovery
Because all state is on disk, crash recovery is reading files. The /gsd:resume-work command reads the latest state, reconstructs context, and resumes execution from where it left off. The /gsd:health command diagnoses and repairs state inconsistencies. No work is lost to crashed sessions.
9. Model Profiles
Different tasks need different levels of intelligence. GSD provides three profiles that control which model is assigned to each execution role. Switch profiles with /gsd:set-profile.
| Role | Quality | Balanced | Budget |
|---|---|---|---|
| Orchestration (mayor, coordinator) | Opus | Sonnet | Sonnet |
| Content production (polecats, fleet) | Sonnet | Sonnet | Haiku |
| Synthesis (cross-module, verification) | Opus | Sonnet | Sonnet |
| Scaffolding (schemas, templates) | Haiku | Haiku | Haiku |
| Planning (requirements, research) | Opus | Sonnet | Haiku |
Quality profile uses Opus for all reasoning-heavy work. Best for complex milestones where correctness matters more than cost. The BLN textbook used this profile.
Balanced profile is the default. Sonnet handles most work, Haiku handles scaffolding. Good for daily development where throughput and cost are both factors.
Budget profile pushes Haiku as far as it can go. Use for mechanical work: generating boilerplate, running formatting passes, producing repetitive content from templates. Not recommended for research or synthesis.
10. Real-World Case Studies
BLN Textbook — Publication-Quality Documentation at Scale
The Blender User Manual demonstrated the full Vision-to-Mission pipeline. A mission pack defined 10 research modules covering every domain of Blender 4.4. Five research agents ran in parallel (Wave 1), each producing ~21K words of structured technical content. Six chapter expansion agents (Wave 2) transformed research into publishable chapters with cross-references and a unified glossary. Wave 3 assembled the PDF. The verification matrix scored 12/12 PASS. The entire mission — from vision document to published PDF — completed in a single session.
HEL Research — Deep Investigation at Volume
The Helium Supply Chain research project tackled a sprawling, technical subject: helium-3 supply chains, semiconductor fabrication, Pacific Rim geopolitics, and the physics of cryogenic systems. Nine agents produced 28 research documents, each standing alone as a complete reference. The publish-pipeline skill generated both HTML pages and a formatted PDF. This proved the system could handle genuinely difficult research — not just text generation, but synthesis across multiple technical domains.
Seattle 360 Engine — Autonomous Release Pipeline
The Seattle 360 engine is a configuration-driven pipeline: for each degree of the compass, research a Pacific Northwest topic, pair it with a species from the Sound of Puget Sound series, generate an SVG line art seed, produce research documents, publish to the web, and create a GitHub release. The pipeline ran 64 autonomous releases (v1.49.135 through v1.49.198) with GUPP driving proactive execution. Each release follows the full plan-execute-verify cycle at compact scale. The engine proved that the GSD patterns work not just for large missions but for sustained, repetitive production.
11. Common Pitfalls
Agent Supervision Neglect
Agents can drift from their instructions. They may read files outside their scope, write to unexpected locations, or interpret ambiguous instructions creatively.
Fix: Always check reads and writes against the plan's file list. The witness-observer skill automates this. If an agent's behavior diverges from instructions, pause and correct.
Task Overloading
Assigning too many tasks to a single agent leads to context exhaustion. The agent starts forgetting earlier instructions or producing lower-quality output for later tasks.
Fix: 5-6 tasks per agent maximum. One agent per milestone. If a milestone has 20 tasks, split across 4 agents in a fleet pattern.
Context Exhaustion Without Handoff
Long-running sessions exhaust the context window. When compaction triggers, critical state can be lost if it was only in the conversation, not on disk.
Fix: Use the context-handoff skill proactively. Write state to .planning/ files. The beads-state system and PostCompact hook provide automatic recovery, but the best defense is never relying on conversation-only state.
Agent Failure Without Fallback
Subagents can fail — context errors, tool permission issues, or unexpected states. If your only path to completion is through a working agent, one failure blocks everything.
Fix: Always have a fallback plan to write directly. If 3 of 4 agents succeed and 1 fails, the coordinator writes the remaining content directly rather than re-dispatching.
State Corruption
Concurrent writes to .planning/ files, interrupted saves, or manual edits that violate expected formats can corrupt project state.
Fix: Run /gsd:health to diagnose. The command detects common issues (missing files, invalid JSON, orphaned phases) and offers repair. For severe corruption, /gsd:forensics performs a post-mortem investigation.
Skipping Research
Jumping straight to execution without the research phase produces superficial output. The model generates plausible-sounding content that lacks depth and accuracy.
Fix: Never skip the research wave. Even for "simple" tasks, /gsd:discuss-phase with --auto takes 30 seconds and catches assumptions that would cost hours to fix later. "Planning is the hard part, once the plans are done the code is easy."
12. Next Steps
Research Index
Browse all 190+ research projects across 13 Rosetta clusters.
GSD-2 Architecture
Deep dive into the state machine, context engineering, and extension system.
Constellation Map
Interactive force-directed graph of all research projects and their connections.
OOPS Analysis
Object-Oriented Problem Solving — 53K words across 10 methodology documents.
GitHub Repository
Source code, issues, and releases. Open source under active development.
llms.txt
Machine-readable project index for LLM context loading.