GSD + Skill Creator Advanced Guide

Autonomous pipeline orchestration, wave-based parallelism, and adaptive learning at scale
v1.49.203 · 21,298 tests · April 2026
21,298Tests Passing
69Skills
39Agents
57Commands
190+Research Projects
8Runtimes Supported

Contents

  1. Prerequisites
  2. The Vision-to-Mission Pipeline
  3. Wave-Based Parallel Execution
  4. The Skill System
  5. Agent Orchestration Patterns
  6. The Hook System
  7. NASA Systems Engineering Methodology
  8. State Management
  9. Model Profiles
  10. Real-World Case Studies
  11. Common Pitfalls
  12. Next Steps

1. Prerequisites

This guide assumes you have already completed the Getting Started workflow and have a working GSD installation. You should be comfortable with:

You should also have:

The Getting Started guide teaches you the workflow. This guide teaches you the system — the patterns, architecture, and strategies that make autonomous execution possible at production scale.

2. The Vision-to-Mission Pipeline (VTM)

Every significant piece of work in GSD passes through a five-stage pipeline that transforms an idea into a verified deliverable. The pipeline is not metaphorical — it maps directly to GSD commands and file artifacts.

Stage 1 Vision What are we building and why
Stage 2 Research Deep investigation before planning
Stage 3 Planning NASA SE 3-level methodology
Stage 4 Execution Wave-based parallel agents
Stage 5 Verification Goal-backward analysis

The Pipeline in Practice: BLN Textbook

The Blender User Manual (BLN) demonstrates the full pipeline at scale. The vision was a comprehensive Blender manual — 10 research modules covering all of Blender 4.4, from interface to furry arts production.

Vision: A mission pack defined the scope: 10 chapters, glossary, cross-references, production-quality prose. Success criteria were explicit — 100K+ words, 6 Blender domains, 350+ target pages.

Research: Five research agents ran in parallel, each responsible for 2 modules. They produced 107K words of structured research — technical depth with pedagogical framing — in a single wave.

Planning: The research output fed into chapter plans. Each chapter had a REQUIREMENTS.md and PLAN.md with section-level specifications, glossary terms, and cross-reference targets.

Execution: Six chapter expansion agents ran in parallel. Each received its research module plus the master glossary, ensuring consistency. The output was 329 pages of final content.

Verification: Goal-backward analysis checked every success criterion against the deliverables. Coverage audits confirmed cross-module references resolved correctly. The verification matrix scored 12/12 PASS.

Total: 11 parallel agents, 107K words of research, 329-page PDF, one session.

3. Wave-Based Parallel Execution

GSD decomposes execution into waves — ordered batches of work where parallelism is safe within a wave and sequencing is enforced between waves. This is how the system runs multiple agents without race conditions or context conflicts.

Wave 0
Haiku
Foundation. Sequential. Schemas, templates, directory structure, shared constants. Everything downstream depends on these artifacts existing. Haiku is fast and cheap — foundation work is mechanical.
Wave 1
Sonnet
Content production. Parallel. Multiple agents writing simultaneously into non-overlapping file spaces. Each agent gets its own scope (chapter, module, component). Sonnet balances quality with throughput.
Wave 2
Opus
Synthesis. Sequential. Cross-module integration, cross-reference resolution, consistency audits, style normalization. Opus handles the long-context reasoning this requires.
Wave 3
Sonnet/Haiku
Publication. Sequential. Assembly into final output format, verification against success criteria, delivery (commit, release, FTP sync). Mechanical work, lower model tier.

Real Numbers

The BLN mission used 11 agents across 4 waves with an estimated 761K tokens. The HEL research project used 9 agents producing 28 documents and 91K words. The Seattle 360 engine has shipped 64 autonomous releases so far, each following the same wave structure at smaller scale.

Why Waves Work

Waves solve the fundamental tension between parallelism and correctness. Within a wave, agents operate on disjoint file sets — no merge conflicts, no overwritten work. Between waves, the sequential barrier ensures downstream agents see the complete output of upstream work. The model assignment per wave (Haiku for scaffolding, Sonnet for content, Opus for synthesis) optimizes cost without sacrificing quality where it matters.

4. The Skill System

Skills are the adaptive learning layer. They auto-activate based on context — you never invoke them manually. When you start planning a phase, the gsd-workflow skill loads. When you commit, beautiful-commits activates. When you work on research, research-engine provides the pipeline.

Skill Anatomy

Each skill is a YAML-frontmatter markdown file in .claude/skills/. The frontmatter defines activation:

---
name: research-engine
description: Autonomous research pipeline
triggers:
  intents:
    - research
    - investigate
    - deep dive
  contexts:
    - planning research phases
    - producing research documents
applies_to:
  - "www/**/*.html"
  - "www/**/*.md"
---

The body contains the skill's instructions — the knowledge and patterns the agent should apply when the skill is active. Skills compose naturally: when you run a research phase that produces commits, research-engine + gsd-workflow + beautiful-commits all activate simultaneously.

69 Installed Skills

Skills are organized by domain: GSD workflow management, agent orchestration (Gastown convoy, GUPP, fleet-mission), content pipelines (research-engine, publish-pipeline), session management (session-awareness, context-handoff, beads-state), security (security-hygiene), and development patterns (typescript-patterns, test-generator, code-review). The skill-integration skill manages the lifecycle of all other skills — it observes patterns and can suggest new skills when it detects recurring workflows.

How Skills Compose

The power of the skill system is composition. A typical autonomous research mission activates:

No configuration required. The skills detect context and load themselves.

5. Agent Orchestration Patterns

The system includes five proven orchestration patterns, each designed for a different topology of work. These are not theoretical — every pattern has been validated at production scale with real deliverables.

Gastown Convoy

Use when: coordinating 3+ agents on a shared codebase

Named after the Gastown district model. A mayor dispatches work to polecats (worker agents), a witness monitors progress and detects stalls, and a refinery merges the results. The mayor never does content work — only coordination. Polecats are ephemeral and stateless. The refinery is deterministic.

Proven at 50+ project scale (PNW Research Series)

GUPP

Use when: converting polled workflows to proactive execution

Get Up and Push Protocol. An interrupt controller that converts the default poll-based agent model into proactive execution. Instead of waiting for the user to check status and dispatch next steps, GUPP detects completion events and immediately pushes the next work item. Configurable thresholds per runtime.

Proven in 360 Engine (64 autonomous releases)

DACP

Use when: handing off context between agents or sessions

Deterministic Agent Communication Protocol. Every handoff is a three-part bundle: (1) what was accomplished, (2) what remains, (3) critical context the next agent needs. Eliminates the "lost context" problem that plagues multi-agent systems. Bundles are files on disk — crash-recoverable by design.

Proven across all multi-session research missions

Sling-Dispatch

Use when: routing work items to available agents dynamically

An instruction dispatch pipeline that implements a 7-stage cycle: fetch available work, allocate to an agent, prepare the execution context, dispatch, monitor, collect results, update state. Sling handles the queue management so the mayor pattern can focus on coordination decisions.

Proven in fleet-mission and mayor-coordinator skills

Fleet Mission

Use when: launching N identical agents in parallel

Parallel agent dispatch with progress tracking and result aggregation. Unlike Gastown (heterogeneous work), Fleet Mission launches N agents with the same template but different inputs — like 5 research agents each covering 2 modules. Progress tracking and result merge are built in.

Proven in BLN (11 agents, 107K words, one session)

Mayor-Coordinator

Use when: one orchestrator needs to manage multiple phases

A Northbridge coordination pattern where a single mayor creates convoys, dispatches work via sling, and monitors completion across multiple simultaneous workstreams. The mayor maintains the global view while polecats handle local execution. Includes stall detection and nudge-sync for stuck agents.

Proven in multi-phase autonomous execution (/gsd:autonomous)

New Subagent Fields (Claude Code v2.1.88+)

The latest Claude Code release added powerful subagent configuration fields that these patterns leverage:

6. The Hook System

Hooks provide deterministic automation — behaviors that fire reliably every time, independent of the LLM's reasoning. Unlike skills (which influence the LLM's behavior), hooks execute scripts before or after tool calls and cannot be overridden by the model.

Available Hook Points

PreToolUse
Runs before a tool call. Can block execution.
PostToolUse
Runs after a tool call. Can capture state.
Notification
Fires on assistant notifications.
PostCompact
After context compaction. Session recovery.
FileChanged
When a file is modified. Auto-formatting.
PermissionDenied
When the model lacks permission. Escalation.

Example: Commit Validation Hook

A PreToolUse hook on Bash intercepts every git commit command and validates the message against Conventional Commits format before allowing it to execute:

{
  "event": "PreToolUse",
  "tool": "Bash",
  "if_conditions": { "command_contains": "git commit" },
  "script": "validate-conventional-commit.sh",
  "action": "block_on_failure"
}

This fires every time, regardless of which skill is active or what the LLM's instructions say. The commit either passes validation or it does not. Deterministic.

Hook + Skill Composition

The hook system and skill system work in complementary layers. Skills guide the LLM's reasoning ("use conventional commits, include a scope"). Hooks enforce invariants the LLM must not violate ("this commit message does not match the format — blocked"). The combination produces reliable behavior: the skill makes the model want to do the right thing; the hook ensures it.

7. NASA Systems Engineering Methodology

GSD's planning methodology is adapted from NASA's systems engineering process. The core insight: planning is the hard part; once the plans are done the code is easy. The methodology has three levels, each building on the previous.

Level 1: Requirements

What must be true when this phase is complete? Requirements are concrete, verifiable, and scoped. They live in REQUIREMENTS.md and use the pattern: "The system shall [verb] [object] [constraint]." Requirements are written before any implementation discussion.

Level 2: Plan

How will the requirements be satisfied? Plans decompose into tasks, specify file paths, define wave assignments, and estimate model profiles. The plan is the execution contract — agents follow it literally. Plans live in PLAN.md and reference specific requirements by ID.

Level 3: Execution

The plan is executed by agents following the wave structure. Each task maps to a commit (or small set of commits). Execution is deterministic: given the same plan and the same model, you get substantially the same output.

Pre-Execution Intelligence

The methodology's secret weapon is answering questions before building. The /gsd:discuss-phase command gathers context through adaptive questioning. The /gsd:research-phase command produces research documents. The /gsd:list-phase-assumptions command surfaces hidden assumptions. By the time execution begins, there are no open questions — only tasks.

Verification

Verification checks what was built against what was planned. The /gsd:verify-work command performs goal-backward analysis: start from each requirement, trace forward to find the artifact that satisfies it. Any requirement without a corresponding deliverable is a gap. The 4-level verification ladder (static, artifact, behavioral, human) ensures thoroughness.

8. State Management

GSD is a disk-driven state machine. All state lives in files — no databases, no external services, no hidden memory. This makes the system inspectable, diffable, and crash-recoverable.

The .planning/ Directory

The .planning/ directory is gitignored by design. It contains all mutable project state:

Cross-Session Persistence

MEMORY.md provides cross-session persistence for Claude agents. It is structured into HOT (always in context), WARM (loaded at session start), and COLD (referenced on demand) sections. The session-awareness skill manages session recovery, and context-handoff creates structured handoff documents when sessions end.

Beads-State

For multi-agent orchestration, the beads-state system provides git-friendly, crash-recoverable state persistence. Agent identities, work assignments, progress counters, and completion status are all stored as flat files that survive process crashes and context compactions. The PostCompact hook triggers state recovery automatically.

Crash Recovery

Because all state is on disk, crash recovery is reading files. The /gsd:resume-work command reads the latest state, reconstructs context, and resumes execution from where it left off. The /gsd:health command diagnoses and repairs state inconsistencies. No work is lost to crashed sessions.

9. Model Profiles

Different tasks need different levels of intelligence. GSD provides three profiles that control which model is assigned to each execution role. Switch profiles with /gsd:set-profile.

Role Quality Balanced Budget
Orchestration (mayor, coordinator) Opus Sonnet Sonnet
Content production (polecats, fleet) Sonnet Sonnet Haiku
Synthesis (cross-module, verification) Opus Sonnet Sonnet
Scaffolding (schemas, templates) Haiku Haiku Haiku
Planning (requirements, research) Opus Sonnet Haiku

Quality profile uses Opus for all reasoning-heavy work. Best for complex milestones where correctness matters more than cost. The BLN textbook used this profile.

Balanced profile is the default. Sonnet handles most work, Haiku handles scaffolding. Good for daily development where throughput and cost are both factors.

Budget profile pushes Haiku as far as it can go. Use for mechanical work: generating boilerplate, running formatting passes, producing repetitive content from templates. Not recommended for research or synthesis.

10. Real-World Case Studies

BLN Textbook — Publication-Quality Documentation at Scale

11 agents · 4 waves · 107K words research · 329-page PDF · ~761K tokens

The Blender User Manual demonstrated the full Vision-to-Mission pipeline. A mission pack defined 10 research modules covering every domain of Blender 4.4. Five research agents ran in parallel (Wave 1), each producing ~21K words of structured technical content. Six chapter expansion agents (Wave 2) transformed research into publishable chapters with cross-references and a unified glossary. Wave 3 assembled the PDF. The verification matrix scored 12/12 PASS. The entire mission — from vision document to published PDF — completed in a single session.

View the BLN research project

HEL Research — Deep Investigation at Volume

9 agents · 28 documents · 91K words · HTML + PDF output

The Helium Supply Chain research project tackled a sprawling, technical subject: helium-3 supply chains, semiconductor fabrication, Pacific Rim geopolitics, and the physics of cryogenic systems. Nine agents produced 28 research documents, each standing alone as a complete reference. The publish-pipeline skill generated both HTML pages and a formatted PDF. This proved the system could handle genuinely difficult research — not just text generation, but synthesis across multiple technical domains.

View the HEL research project

Seattle 360 Engine — Autonomous Release Pipeline

64 releases · 57/360 degrees · configuration-driven · fully autonomous

The Seattle 360 engine is a configuration-driven pipeline: for each degree of the compass, research a Pacific Northwest topic, pair it with a species from the Sound of Puget Sound series, generate an SVG line art seed, produce research documents, publish to the web, and create a GitHub release. The pipeline ran 64 autonomous releases (v1.49.135 through v1.49.198) with GUPP driving proactive execution. Each release follows the full plan-execute-verify cycle at compact scale. The engine proved that the GSD patterns work not just for large missions but for sustained, repetitive production.

View the S36 research project

11. Common Pitfalls

Agent Supervision Neglect

Agents can drift from their instructions. They may read files outside their scope, write to unexpected locations, or interpret ambiguous instructions creatively.

Fix: Always check reads and writes against the plan's file list. The witness-observer skill automates this. If an agent's behavior diverges from instructions, pause and correct.

Task Overloading

Assigning too many tasks to a single agent leads to context exhaustion. The agent starts forgetting earlier instructions or producing lower-quality output for later tasks.

Fix: 5-6 tasks per agent maximum. One agent per milestone. If a milestone has 20 tasks, split across 4 agents in a fleet pattern.

Context Exhaustion Without Handoff

Long-running sessions exhaust the context window. When compaction triggers, critical state can be lost if it was only in the conversation, not on disk.

Fix: Use the context-handoff skill proactively. Write state to .planning/ files. The beads-state system and PostCompact hook provide automatic recovery, but the best defense is never relying on conversation-only state.

Agent Failure Without Fallback

Subagents can fail — context errors, tool permission issues, or unexpected states. If your only path to completion is through a working agent, one failure blocks everything.

Fix: Always have a fallback plan to write directly. If 3 of 4 agents succeed and 1 fails, the coordinator writes the remaining content directly rather than re-dispatching.

State Corruption

Concurrent writes to .planning/ files, interrupted saves, or manual edits that violate expected formats can corrupt project state.

Fix: Run /gsd:health to diagnose. The command detects common issues (missing files, invalid JSON, orphaned phases) and offers repair. For severe corruption, /gsd:forensics performs a post-mortem investigation.

Skipping Research

Jumping straight to execution without the research phase produces superficial output. The model generates plausible-sounding content that lacks depth and accuracy.

Fix: Never skip the research wave. Even for "simple" tasks, /gsd:discuss-phase with --auto takes 30 seconds and catches assumptions that would cost hours to fix later. "Planning is the hard part, once the plans are done the code is easy."

12. Next Steps