Advanced Technical Guide

GSD Skill Creator — Deep Systems

Vision-to-Mission pipelines, NASA SE methodology, Apollo AGC simulation, cloud operations, the Deterministic Agent Communication Protocol, and what shipped in the v1.49 branch.

01

The Vision-to-Mission Pipeline

The Problem It Solves

The most expensive mistake in any AI-assisted project is asking questions during execution that could have been answered before execution started. Every time Claude stops building to ask "wait, do you want PostgreSQL or MySQL?" or "should this be a REST API or GraphQL?", the context window fills with decision-making conversation instead of implementation work. Multiply this across a 30-phase project, and you've wasted thousands of tokens on avoidable interruptions.

The Vision-to-Mission (VTM) pipeline solves this by doing all the thinking, researching, and decision-making before a single line of code is written. It takes a vision document — a plain-English description of what you want to build — and transforms it into a fully specified mission package with research, component specs, wave plans, model assignments, and test plans. By the time execution starts, there are no open questions. Every spec has been researched, every dependency has been mapped, and every task knows exactly which model and token budget it needs.

This is why experienced Skill Creator users report dramatically faster builds. Not because the code is written faster, but because the execution phase has almost zero interruptions. The system already knows the answers.

The Seven Stages

Parse
Validate
Classify
Research
Assemble
Plan Waves
Assign Models

1. Vision Parser. Takes a Markdown vision document and extracts structured sections using regex-based section extraction. Produces a typed VisionDocument object with Zod-validated fields.

2. Vision Validator. Runs structural validation and quality checks. Are all required sections present? Are constraints specific enough? Are success criteria measurable? Produces diagnostics with severity levels.

3. Archetype Classifier. Categorizes the project into one of four archetypes — Educational, Infrastructure, Organizational, or Creative — which determines how research, planning, and execution are weighted. An infrastructure project gets more safety research and deployment verification. An educational project gets more pedagogical scaffolding and assessment design.

4. Research Compiler. This is where the pre-execution intelligence happens. The compiler takes the vision document's technical requirements and compiles a research package with tiered knowledge chunking:

The compiler also runs a source quality checker, a safety boundary extractor, and a research necessity detector that determines which topics need fresh investigation versus which can use existing skill knowledge.

5. Mission Assembler. Converts the validated vision + compiled research into a mission package: self-contained component specs, milestone definitions, and test plans. Each component spec declares its dependencies, its produces (outputs), its verification criteria, and its model recommendation.

6. Wave Planner. This is the parallel execution optimizer. See below.

7. Model Assignment Engine. Assigns Opus, Sonnet, or Haiku to each component based on weighted signals: complexity, safety criticality, historical drift rates, and token budget. Enforces a 60/40 budget principle (60% of tokens for implementation, 40% for verification and overhead). Only downgrades are automatic; upgrades require human approval.

Wave Planning & Parallel Execution

The wave planner is a graph-coloring algorithm that maximizes parallelism while respecting dependency ordering. It decomposes all component specs into dependency-ordered waves:

Wave 0 — Foundation (always sequential)

Type definitions, interface contracts, schemas, and configuration. Anything with no dependencies and containing keywords like "types," "interfaces," "schema," or "config" is forced into Wave 0. This wave must complete before anything else starts. It's the ground truth that everything else builds on.

Wave 1+ — Parallel Tracks

The planner builds a dependency graph, identifies the critical path, then groups non-conflicting specs into concurrent tracks. Each track can execute independently — different agents, different context windows, full parallelism. The planner calculates a "sequential savings" metric showing how much faster wave execution is compared to running everything in sequence.

Here's how it actually looked in v1.49's execution:

Wave 0 (sequential):  Phase 446 → Phase 447       # types, then bundle format
Wave 1 (parallel):    Phase 448 | Phase 449        # assembler AND interpreter
Wave 2 (parallel):    Phase 450+451 | Phase 452+453 # retro+skills AND templates+bus
Wave 3 (parallel):    Phase 454 | Phase 455        # dashboard AND CLI
Wave 4 (sequential):  Phase 456                     # verification (depends on all)

Wave 0 builds the type system. Wave 1 runs the assembler and interpreter in parallel because they depend on types but not each other. Wave 2 runs two pairs of related phases simultaneously. Only the final verification phase runs alone because it needs to test everything.

The Cache Optimizer

The cache optimizer analyzes the mission package and identifies opportunities to share context across phases:

This is important because token budget is the fundamental constraint. Every token spent re-loading the same skill into a new context window is a token that could have been spent on implementation. The cache optimizer minimizes waste.

02

Pre-Execution Intelligence

The VTM pipeline is the most visible form of pre-execution intelligence, but the principle runs deeper through the entire system. The idea is that every question that can be answered before execution should be answered before execution.

In GSD alone, this happens at two points: the discuss-phase step captures your implementation preferences, and plan-phase spawns parallel research agents to investigate the domain. Skill Creator extends this by:

Some things are genuinely unavoidable — the user changes their mind, a library has an undocumented bug, a test reveals a flawed assumption. That's why state tracking exists.

03

State Tracking & Human-in-the-Loop

The system maintains persistent state in several files that survive across sessions, context window resets, and even system interruptions:

FileContentsSurvives
STATE.mdCurrent position, decisions made, blockers, context for the next sessionSession restart, context reset
ROADMAP.mdPhase completion status, what's done, what's nextEntire project lifecycle
{phase}-SUMMARY.mdWhat happened in each phase, what changed, verification resultsForever (committed to git)
{phase}-VERIFICATION.mdAutomated verification results against phase goalsForever
.planning/patterns/Session observations, feedback, suggestions (append-only JSONL)Configured retention (default 90 days)

When you run /gsd:pause-work, the system creates a handoff document capturing exactly where things stand. /gsd:resume-work reads it and picks up precisely where you left off. The executor agent doesn't need to scan the codebase to figure out what happened — it reads the state files.

Human-in-the-Loop Is Non-Negotiable

Even in YOLO mode (where GSD auto-approves steps), human gates exist at critical points:

The philosophy is: automate the mundane, gate the consequential. The system can auto-commit code, auto-load skills, auto-assign models. It cannot auto-approve architectural changes, auto-deploy to production, or auto-bypass safety checks.

04

How the Scheduling System Works

Scheduling happens at three levels: the GSD phase scheduler, the chipset kernel scheduler, and the wave execution scheduler. They work together but operate at different scales.

GSD Phase Scheduling

GSD's execute-phase command takes the plans created during plan-phase and groups them into execution waves. Plans that share no dependencies run in parallel. Plans with dependencies run sequentially. Each plan gets its own fresh 200K-token context window — the executor agent sees only the plan, the relevant skills, and the project state. No accumulated conversation history.

Parallel execution uses subagent spawning: the main orchestrator forks child contexts, each executing independently, then collects and integrates results. The orchestrator's own context stays lean — at 30–40% utilization even during large phases — because the heavy work happens in the subagent contexts.

Chipset Kernel Scheduling

The ExecKernel is a tick-driven scheduler that manages the four engine domains (context, output, io, glue). Each kernel tick runs three operations:

  1. Schedule — The ExecScheduler checks which engines are awake, which are sleeping, and which have pending work. Engines sleep when idle and wake on signal.
  2. Route — Pending messages on each engine's inbound MessagePort are processed. Messages are typed (KernelMessage) and routed through FIFO transport with reply-based ownership.
  3. Budget — The BudgetManager tracks token consumption per engine against allocated percentages. If an engine exceeds its budget, it's flagged and can be throttled.

Engines coordinate using a 32-bit signal mask (bits 0–15 system-reserved, 16–31 user-allocatable). Signals are lightweight — a single bit flip is cheaper than sending a full message. An engine can wait on a signal bit, sleep until it's set, then wake and process. This is the same interrupt-driven scheduling pattern that the original Amiga Exec used for inter-process communication.

Wave Execution Scheduling

The wave planner produces a WaveExecutionPlan — an ordered list of waves, each containing parallel tracks. The plan includes a critical path analysis (the longest dependency chain that determines minimum execution time) and a sequential savings metric. During execution, the kernel uses this plan to orchestrate subagent spawning: one subagent per track, all tracks in a wave launched simultaneously, wave boundaries enforced as synchronization barriers.

05

How Verification Actually Works

Verification is not a checkbox. It's a multi-layer system that catches problems at different granularities.

Task-Level Verification

Every task plan contains a <verify> block with a concrete, executable test. "Run the migration, check the table exists with the correct columns." The executor runs this after completing the task. If it fails, the task is marked incomplete and the orchestrator decides whether to retry, debug, or escalate.

Phase-Level Verification

After all tasks in a phase complete, GSD runs verify-work. This spawns a verification agent that is independent of the executor — it has no context bleed from the implementation. It checks the codebase against the phase goals, runs tests, and produces a VERIFICATION.md report. If discrepancies are found, it spawns debug agents to diagnose root causes and creates fix plans.

User Acceptance Testing

The verification agent extracts testable deliverables and walks the user through them one at a time: "Can you create an account?" "Does the dashboard load?" "Is the data persisted after refresh?" Each is a binary yes/no. Failed items get automatic diagnostic analysis.

Milestone Auditing

/gsd:audit-milestone verifies the entire milestone against its definition of done. If gaps are found, /gsd:plan-milestone-gaps creates new phases to close them.

V&V Compliance (NASA SE)

For milestones using the NASA SE methodology (like the OpenStack deployment), verification goes further. A requirements verification matrix maps every requirement to a TAID verification method (Test, Analysis, Inspection, or Demonstration). Each method has specific acceptance criteria, a verification procedure, and traceability back to the original requirement. NPR 7123.1 Appendix H compliance is tracked with tailoring rationale.

Safety-Critical Verification

For engineering packs, safety-critical tests are mandatory-pass. They cannot be skipped, deferred, or overridden. The v1.49 DACP introduced 8 safety-critical tests (SC-01 through SC-08) covering script non-execution, backward compatibility, cooldown enforcement, size limits, provenance enforcement, fidelity bounding, artifact immutability, and graceful degradation. All 8 must pass before a release ships.

06

NASA Systems Engineering Methodology

The OpenStack Cloud Platform (v1.33) is the flagship application of NASA SE methodology within Skill Creator. It maps the 7 NASA SE phases (Pre-Phase A through Phase F, per SP-6105 and NPR 7123.1) to cloud infrastructure operations:

NASA PhaseCloud Ops Mapping
Pre-Phase A: Concept StudiesRequirements gathering, architecture selection, reference design
Phase A: Concept & Technology DevLab environment, proof of concept, technology evaluation
Phase B: Preliminary DesignNetwork topology, service architecture, security model
Phase C: Final Design & FabricationKolla-Ansible configuration, secrets management, image builds
Phase D: System Assembly & TestDeployment, integration testing, E2E verification
Phase E: Operations & SustainmentDay-2 operations, monitoring, capacity management, incident response
Phase F: CloseoutDecommissioning in exact reverse of Phase D deployment order

The key innovation is the communication framework: 9 typed communication loops (command, execution, specialist, user, observation, health, budget, cloud-ops, doc-sync) with priority-based bus arbitration. A HALT signal propagates within 1 cycle. A budget agent tracks token consumption and warns at 90%, blocks at 95%.

Three mission crews (31 agents total) handle deployment, operations, and documentation. The crews use spacecraft-style role naming — CAPCOM as the sole human interface, SURGEON for health monitoring, CRAFT for documentation. Crew handoffs include full context transfer.

The V&V plan maps 55 requirements to TAID verification methods. 22 safety-critical test procedures verify that deployment gates cannot be bypassed, that rollback procedures work, and that service health monitoring catches degradation before users notice.

07

OpenStack Cloud Operations

The cloud operations system isn't just documentation — it's a complete operational skill pack with executable verification. It includes:

08

The Apollo AGC Simulator

The Apollo Guidance Computer (AGC) Block II simulator (v1.23) is both a working emulator and an educational curriculum. It's a deep-dive into real-time embedded systems design from 1966 — and it maps directly to how Skill Creator's own scheduling and interrupt systems work.

The Simulator

The DSKY Interface

An authentic Display and Keyboard model: relay decoding, 6 numeric registers, 11 annunciators (warning lights), 19-key keyboard, VERB/NOUN command processor, and an Executive Monitor with real-time scheduling visualization. Learn mode annotations explain what the AGC is doing at each step.

Development Tools

A yaYUL-compatible assembler, step debugger with breakpoints and watchpoints, disassembler, and rope loader for Virtual AGC format files. 54 validation tests verify instruction-level accuracy.

The Curriculum

11 chapters from orientation to AGC-to-GSD architectural patterns, with 8 hands-on exercises culminating in reproducing the Apollo 11 1202 alarm. The curriculum explicitly draws connections between the AGC's scheduling primitives and Skill Creator's modern equivalents — showing that the same coordination problems that NASA solved in 1966 are the same ones that AI agent orchestration faces today.

09

v1.49 — The Deterministic Agent Communication Protocol

DACP is the biggest architectural addition in the v1.49 branch. It replaces markdown-only agent handoffs with a structured, verifiable, adaptive protocol. The core insight: when Agent A hands work to Agent B using only prose, Agent B interprets the prose, and interpretation drift accumulates. After several handoffs, the work has diverged significantly from the original intent. DACP fixes this.

Three-Part Bundles

Every agent handoff now consists of three parts:

PartFormatPurpose
IntentMarkdown (intent.md)Human-readable description of what should happen and why. This is what a person reads to understand the handoff.
DataJSON (data/)Structured, schema-validated data. File paths, configuration values, dependency lists, test criteria — anything that must not be "interpreted."
CodeScripts (code/)Executable scripts with provenance tracking. If the handoff includes "run lint," the actual lint script is bundled — not a prose instruction to figure out linting.

Bundles are directory-based: manifest.json, intent.md, data/, and code/, with an atomic .complete marker. Size limits enforce discipline: 50KB data payloads, 10KB per script, 100KB total. Every bundle includes a mandatory .msg fallback for backward compatibility with systems that only understand markdown handoffs.

Adaptive Fidelity

Not every handoff needs a full three-part bundle. DACP uses an adaptive fidelity model with four levels:

LevelContentsWhen Used
L0: ProseMarkdown onlySimple, low-risk handoffs where interpretation ambiguity is minimal
L1: Prose + DataMarkdown + JSON schemasHandoffs with specific values that shouldn't be approximated
L2: StructuredMarkdown + JSON + referencesStandard development handoffs with schema-validated contracts
L3: Full BundleMarkdown + JSON + executable scriptsSafety-critical, complex, or historically high-drift handoffs

The fidelity decision engine considers five factors: data complexity, historical drift for this handoff type, available skills in the library, remaining token budget, and safety criticality. It achieved 95% accuracy across 20 test scenarios (exceeding the 85% target). Fidelity changes are bounded to a maximum of 1 level per cycle — the system can't jump from L0 to L3 in one step.

The Retrospective Analyzer

This is the learning loop applied to agent communication. After a handoff completes, the retrospective analyzer measures drift using a composite score with weighted components:

When drift exceeds 0.3, the analyzer recommends promoting that handoff type to a higher fidelity level. When drift stays below 0.05 consistently, it recommends demotion (saving tokens). Cooldown enforcement prevents thrashing: 7 days between promotions, 14 days between demotions. All recommendations require human approval.

The analyzer also detects patterns — recurring handoff types and their drift rates — and stores them in append-only JSONL for longitudinal analysis. Over time, the system learns exactly which handoff types need structure and which can safely use prose.

Safety Architecture

DACP's safety model has 8 mandatory requirements, each with a corresponding safety-critical test that must pass:

IDRequirementMechanism
SAFE-01Scripts are never auto-executedObject.freeze on all script references
SAFE-02Fidelity changes bounded to ±1clampFidelityChange() enforced at the type level
SAFE-03No unprovenanced dataScript catalog rejects entries without valid provenance
SAFE-04Graceful degradationtryLoadBundle returns null, never throws
SAFE-05Cooldowns enforced7-day promote, 14-day demote per pattern
SAFE-06Provenance chain enforcementScripts without valid source skill + version are rejected
SAFE-07Size limits enforced50KB data, 10KB script, 100KB total
SAFE-08Backward compatibilityEvery bundle has a .msg fallback
10

v1.49.1–v1.49.6 — What Else Shipped

The v1.49 branch didn't stop at DACP. Six patch releases followed, each addressing real-world deployment issues:

v1.49.4 — Filesystem Management

Introduced a zone-based filesystem organization: projects/ for GSD projects, contrib/ for upstream/downstream collaboration, packs/ for educational content, and www/ for web staging. A .sc-config.json with Zod validation supports external project directories. 8 new CLI commands (sc project init, sc pack list, sc contrib status, etc.) make navigation instant. Path traversal prevention rejects .., /, and \ in project names before any filesystem operation touches disk.

v1.49.5 — Linux FHS & XDG Compliance

A deep refactor following Linux open source conventions. Root directories reduced from 33 to 26. Added scdoc man pages, shell completions for bash/zsh/fish, a freedesktop.org .desktop entry, AppStream metadata, and a systemd user service unit for headless agent mode. Built Debian and RPM packaging infrastructure. Implemented XDG Base Directory utilities in both TypeScript and Rust, with relative-path rejection in both implementations. 103 files changed, zero regressions across 19,222 tests.

v1.49.6 — macOS Compatibility & Dependency Hardening

Fixed a C++ mutex crash in onnxruntime-node on macOS (static destruction order race), replaced Bash 4.3+ features with POSIX-compatible alternatives for Bash 3.2 (macOS ships ancient Bash and will never upgrade past GPLv2), and eliminated the natural NLP package — which pulled 300MB of unnecessary dependencies including PostgreSQL, MongoDB, and Redis drivers — replacing it with 250 lines of hand-rolled, zero-dependency TF-IDF and Naive Bayes implementations. The @huggingface/transformers package moved to optional with automatic fallback to heuristic scoring.

The v1.49 branch in numbers: 68 requirements for DACP core, 49 for filesystem management, 49 for FHS compliance. 19,222 tests total. 219 TypeScript compilation errors fixed to zero. Three platforms verified (Linux, macOS, Windows via WSL).
11

v1.48 — Physical Infrastructure Engineering

The Physical Infrastructure Engineering Pack (v1.48) demonstrates VTM at full scale. An 80-requirement milestone executed across 12 phases, 30 plans, and 52 commits in 3 sessions (~4.5 hours wall clock). It covers:

The Safety Warden runs in a non-bypassable architecture: every engineering calculation passes through safety before it reaches the user. PE (Professional Engineer) disclaimers are structural, not configurable. The warden was implemented as Wave 0 — before any domain skill — so there was never a moment during development when a calculation could exist without safety annotations.

The retrospective noted that 80-requirement milestones need 3 sessions, with context budget exhaustion predictable at about 25–30 requirements per session. Sessions split naturally at wave boundaries. The VTM-to-GSD pipeline was used for the seventh consecutive milestone, confirming the pattern's reliability.

12

The Design Philosophy Behind All of It

Everything in this guide — VTM pipelines, wave planners, NASA SE phases, AGC simulators, DACP bundles, safety wardens — traces back to one principle:

The AMIGA Principle

Specialized, constrained building blocks composed intelligently will outperform general-purpose brute force. A skill that uses 2% of the context window is more powerful than a mega-prompt that uses 30%, because the small skill can compose with twenty others. A three-part handoff bundle with schema validation catches drift that prose instructions never would. A wave planner that maximizes parallelism finishes in half the time of sequential execution. None of these components is individually complex. Their power comes from composition.

The Amiga's custom chips — Agnus, Denise, Paula, Gary — were not individually powerful. They were precisely constrained for their domains, and they composed through a shared bus architecture. A 7 MHz system outperformed 16 MHz competitors because every transistor was doing work that mattered, and nothing fought itself.

Skill Creator applies this insight to AI agent orchestration. Skills are custom chips. Agents are chipset configurations. Teams are system architectures. The kernel is the scheduler. Copper lists are declarative workflow programs. The token budget is the memory bus bandwidth. And the AMIGA Principle — that the right constraints, applied to the right components, composed through the right interfaces, produce capabilities that exceed the sum of their parts — is the through-line that makes all of it work.

The system has grown from 6 capabilities in v1.0 to 139 in v1.33, with 19,222 tests as of v1.49.5. It has produced approximately 350,000 lines of code across 32 milestones, 278+ phases, and 740+ plans. Every one of those milestones was built using the same system it was building — the ultimate test of whether the tools actually work.