Vision-to-Mission pipelines, NASA SE methodology, Apollo AGC simulation, cloud operations, the Deterministic Agent Communication Protocol, and what shipped in the v1.49 branch.
The most expensive mistake in any AI-assisted project is asking questions during execution that could have been answered before execution started. Every time Claude stops building to ask "wait, do you want PostgreSQL or MySQL?" or "should this be a REST API or GraphQL?", the context window fills with decision-making conversation instead of implementation work. Multiply this across a 30-phase project, and you've wasted thousands of tokens on avoidable interruptions.
The Vision-to-Mission (VTM) pipeline solves this by doing all the thinking, researching, and decision-making before a single line of code is written. It takes a vision document — a plain-English description of what you want to build — and transforms it into a fully specified mission package with research, component specs, wave plans, model assignments, and test plans. By the time execution starts, there are no open questions. Every spec has been researched, every dependency has been mapped, and every task knows exactly which model and token budget it needs.
This is why experienced Skill Creator users report dramatically faster builds. Not because the code is written faster, but because the execution phase has almost zero interruptions. The system already knows the answers.
1. Vision Parser. Takes a Markdown vision document and extracts structured sections using regex-based section extraction. Produces a typed VisionDocument object with Zod-validated fields.
2. Vision Validator. Runs structural validation and quality checks. Are all required sections present? Are constraints specific enough? Are success criteria measurable? Produces diagnostics with severity levels.
3. Archetype Classifier. Categorizes the project into one of four archetypes — Educational, Infrastructure, Organizational, or Creative — which determines how research, planning, and execution are weighted. An infrastructure project gets more safety research and deployment verification. An educational project gets more pedagogical scaffolding and assessment design.
4. Research Compiler. This is where the pre-execution intelligence happens. The compiler takes the vision document's technical requirements and compiles a research package with tiered knowledge chunking:
The compiler also runs a source quality checker, a safety boundary extractor, and a research necessity detector that determines which topics need fresh investigation versus which can use existing skill knowledge.
5. Mission Assembler. Converts the validated vision + compiled research into a mission package: self-contained component specs, milestone definitions, and test plans. Each component spec declares its dependencies, its produces (outputs), its verification criteria, and its model recommendation.
6. Wave Planner. This is the parallel execution optimizer. See below.
7. Model Assignment Engine. Assigns Opus, Sonnet, or Haiku to each component based on weighted signals: complexity, safety criticality, historical drift rates, and token budget. Enforces a 60/40 budget principle (60% of tokens for implementation, 40% for verification and overhead). Only downgrades are automatic; upgrades require human approval.
The wave planner is a graph-coloring algorithm that maximizes parallelism while respecting dependency ordering. It decomposes all component specs into dependency-ordered waves:
Type definitions, interface contracts, schemas, and configuration. Anything with no dependencies and containing keywords like "types," "interfaces," "schema," or "config" is forced into Wave 0. This wave must complete before anything else starts. It's the ground truth that everything else builds on.
The planner builds a dependency graph, identifies the critical path, then groups non-conflicting specs into concurrent tracks. Each track can execute independently — different agents, different context windows, full parallelism. The planner calculates a "sequential savings" metric showing how much faster wave execution is compared to running everything in sequence.
Here's how it actually looked in v1.49's execution:
Wave 0 (sequential): Phase 446 → Phase 447 # types, then bundle format Wave 1 (parallel): Phase 448 | Phase 449 # assembler AND interpreter Wave 2 (parallel): Phase 450+451 | Phase 452+453 # retro+skills AND templates+bus Wave 3 (parallel): Phase 454 | Phase 455 # dashboard AND CLI Wave 4 (sequential): Phase 456 # verification (depends on all)
Wave 0 builds the type system. Wave 1 runs the assembler and interpreter in parallel because they depend on types but not each other. Wave 2 runs two pairs of related phases simultaneously. Only the final verification phase runs alone because it needs to test everything.
The cache optimizer analyzes the mission package and identifies opportunities to share context across phases:
This is important because token budget is the fundamental constraint. Every token spent re-loading the same skill into a new context window is a token that could have been spent on implementation. The cache optimizer minimizes waste.
The VTM pipeline is the most visible form of pre-execution intelligence, but the principle runs deeper through the entire system. The idea is that every question that can be answered before execution should be answered before execution.
In GSD alone, this happens at two points: the discuss-phase step captures your implementation preferences, and plan-phase spawns parallel research agents to investigate the domain. Skill Creator extends this by:
supabase-auth skill, the system won't waste time researching authentication approaches.Some things are genuinely unavoidable — the user changes their mind, a library has an undocumented bug, a test reveals a flawed assumption. That's why state tracking exists.
The system maintains persistent state in several files that survive across sessions, context window resets, and even system interruptions:
| File | Contents | Survives |
|---|---|---|
STATE.md | Current position, decisions made, blockers, context for the next session | Session restart, context reset |
ROADMAP.md | Phase completion status, what's done, what's next | Entire project lifecycle |
{phase}-SUMMARY.md | What happened in each phase, what changed, verification results | Forever (committed to git) |
{phase}-VERIFICATION.md | Automated verification results against phase goals | Forever |
.planning/patterns/ | Session observations, feedback, suggestions (append-only JSONL) | Configured retention (default 90 days) |
When you run /gsd:pause-work, the system creates a handoff document capturing exactly where things stand. /gsd:resume-work reads it and picks up precisely where you left off. The executor agent doesn't need to scan the codebase to figure out what happened — it reads the state files.
Even in YOLO mode (where GSD auto-approves steps), human gates exist at critical points:
The philosophy is: automate the mundane, gate the consequential. The system can auto-commit code, auto-load skills, auto-assign models. It cannot auto-approve architectural changes, auto-deploy to production, or auto-bypass safety checks.
Scheduling happens at three levels: the GSD phase scheduler, the chipset kernel scheduler, and the wave execution scheduler. They work together but operate at different scales.
GSD's execute-phase command takes the plans created during plan-phase and groups them into execution waves. Plans that share no dependencies run in parallel. Plans with dependencies run sequentially. Each plan gets its own fresh 200K-token context window — the executor agent sees only the plan, the relevant skills, and the project state. No accumulated conversation history.
Parallel execution uses subagent spawning: the main orchestrator forks child contexts, each executing independently, then collects and integrates results. The orchestrator's own context stays lean — at 30–40% utilization even during large phases — because the heavy work happens in the subagent contexts.
The ExecKernel is a tick-driven scheduler that manages the four engine domains (context, output, io, glue). Each kernel tick runs three operations:
KernelMessage) and routed through FIFO transport with reply-based ownership.Engines coordinate using a 32-bit signal mask (bits 0–15 system-reserved, 16–31 user-allocatable). Signals are lightweight — a single bit flip is cheaper than sending a full message. An engine can wait on a signal bit, sleep until it's set, then wake and process. This is the same interrupt-driven scheduling pattern that the original Amiga Exec used for inter-process communication.
The wave planner produces a WaveExecutionPlan — an ordered list of waves, each containing parallel tracks. The plan includes a critical path analysis (the longest dependency chain that determines minimum execution time) and a sequential savings metric. During execution, the kernel uses this plan to orchestrate subagent spawning: one subagent per track, all tracks in a wave launched simultaneously, wave boundaries enforced as synchronization barriers.
Verification is not a checkbox. It's a multi-layer system that catches problems at different granularities.
Every task plan contains a <verify> block with a concrete, executable test. "Run the migration, check the table exists with the correct columns." The executor runs this after completing the task. If it fails, the task is marked incomplete and the orchestrator decides whether to retry, debug, or escalate.
After all tasks in a phase complete, GSD runs verify-work. This spawns a verification agent that is independent of the executor — it has no context bleed from the implementation. It checks the codebase against the phase goals, runs tests, and produces a VERIFICATION.md report. If discrepancies are found, it spawns debug agents to diagnose root causes and creates fix plans.
The verification agent extracts testable deliverables and walks the user through them one at a time: "Can you create an account?" "Does the dashboard load?" "Is the data persisted after refresh?" Each is a binary yes/no. Failed items get automatic diagnostic analysis.
/gsd:audit-milestone verifies the entire milestone against its definition of done. If gaps are found, /gsd:plan-milestone-gaps creates new phases to close them.
For milestones using the NASA SE methodology (like the OpenStack deployment), verification goes further. A requirements verification matrix maps every requirement to a TAID verification method (Test, Analysis, Inspection, or Demonstration). Each method has specific acceptance criteria, a verification procedure, and traceability back to the original requirement. NPR 7123.1 Appendix H compliance is tracked with tailoring rationale.
For engineering packs, safety-critical tests are mandatory-pass. They cannot be skipped, deferred, or overridden. The v1.49 DACP introduced 8 safety-critical tests (SC-01 through SC-08) covering script non-execution, backward compatibility, cooldown enforcement, size limits, provenance enforcement, fidelity bounding, artifact immutability, and graceful degradation. All 8 must pass before a release ships.
The OpenStack Cloud Platform (v1.33) is the flagship application of NASA SE methodology within Skill Creator. It maps the 7 NASA SE phases (Pre-Phase A through Phase F, per SP-6105 and NPR 7123.1) to cloud infrastructure operations:
| NASA Phase | Cloud Ops Mapping |
|---|---|
| Pre-Phase A: Concept Studies | Requirements gathering, architecture selection, reference design |
| Phase A: Concept & Technology Dev | Lab environment, proof of concept, technology evaluation |
| Phase B: Preliminary Design | Network topology, service architecture, security model |
| Phase C: Final Design & Fabrication | Kolla-Ansible configuration, secrets management, image builds |
| Phase D: System Assembly & Test | Deployment, integration testing, E2E verification |
| Phase E: Operations & Sustainment | Day-2 operations, monitoring, capacity management, incident response |
| Phase F: Closeout | Decommissioning in exact reverse of Phase D deployment order |
The key innovation is the communication framework: 9 typed communication loops (command, execution, specialist, user, observation, health, budget, cloud-ops, doc-sync) with priority-based bus arbitration. A HALT signal propagates within 1 cycle. A budget agent tracks token consumption and warns at 90%, blocks at 95%.
Three mission crews (31 agents total) handle deployment, operations, and documentation. The crews use spacecraft-style role naming — CAPCOM as the sole human interface, SURGEON for health monitoring, CRAFT for documentation. Crew handoffs include full context transfer.
The V&V plan maps 55 requirements to TAID verification methods. 22 safety-critical test procedures verify that deployment gates cannot be bypassed, that rollback procedures work, and that service health monitoring catches degradation before users notice.
The cloud operations system isn't just documentation — it's a complete operational skill pack with executable verification. It includes:
--dry-run mode: a 7-stage deployment verification and an 8-stage user scenario verification (from authentication through floating IP assignment)chipset.yaml with pre-deploy and post-deploy evaluation gates — 118 validation checks totalThe Apollo Guidance Computer (AGC) Block II simulator (v1.23) is both a working emulator and an educational curriculum. It's a deep-dive into real-time embedded systems design from 1966 — and it maps directly to how Skill Creator's own scheduling and interrupt systems work.
An authentic Display and Keyboard model: relay decoding, 6 numeric registers, 11 annunciators (warning lights), 19-key keyboard, VERB/NOUN command processor, and an Executive Monitor with real-time scheduling visualization. Learn mode annotations explain what the AGC is doing at each step.
A yaYUL-compatible assembler, step debugger with breakpoints and watchpoints, disassembler, and rope loader for Virtual AGC format files. 54 validation tests verify instruction-level accuracy.
11 chapters from orientation to AGC-to-GSD architectural patterns, with 8 hands-on exercises culminating in reproducing the Apollo 11 1202 alarm. The curriculum explicitly draws connections between the AGC's scheduling primitives and Skill Creator's modern equivalents — showing that the same coordination problems that NASA solved in 1966 are the same ones that AI agent orchestration faces today.
DACP is the biggest architectural addition in the v1.49 branch. It replaces markdown-only agent handoffs with a structured, verifiable, adaptive protocol. The core insight: when Agent A hands work to Agent B using only prose, Agent B interprets the prose, and interpretation drift accumulates. After several handoffs, the work has diverged significantly from the original intent. DACP fixes this.
Every agent handoff now consists of three parts:
| Part | Format | Purpose |
|---|---|---|
| Intent | Markdown (intent.md) | Human-readable description of what should happen and why. This is what a person reads to understand the handoff. |
| Data | JSON (data/) | Structured, schema-validated data. File paths, configuration values, dependency lists, test criteria — anything that must not be "interpreted." |
| Code | Scripts (code/) | Executable scripts with provenance tracking. If the handoff includes "run lint," the actual lint script is bundled — not a prose instruction to figure out linting. |
Bundles are directory-based: manifest.json, intent.md, data/, and code/, with an atomic .complete marker. Size limits enforce discipline: 50KB data payloads, 10KB per script, 100KB total. Every bundle includes a mandatory .msg fallback for backward compatibility with systems that only understand markdown handoffs.
Not every handoff needs a full three-part bundle. DACP uses an adaptive fidelity model with four levels:
| Level | Contents | When Used |
|---|---|---|
| L0: Prose | Markdown only | Simple, low-risk handoffs where interpretation ambiguity is minimal |
| L1: Prose + Data | Markdown + JSON schemas | Handoffs with specific values that shouldn't be approximated |
| L2: Structured | Markdown + JSON + references | Standard development handoffs with schema-validated contracts |
| L3: Full Bundle | Markdown + JSON + executable scripts | Safety-critical, complex, or historically high-drift handoffs |
The fidelity decision engine considers five factors: data complexity, historical drift for this handoff type, available skills in the library, remaining token budget, and safety criticality. It achieved 95% accuracy across 20 test scenarios (exceeding the 85% target). Fidelity changes are bounded to a maximum of 1 level per cycle — the system can't jump from L0 to L3 in one step.
This is the learning loop applied to agent communication. After a handoff completes, the retrospective analyzer measures drift using a composite score with weighted components:
When drift exceeds 0.3, the analyzer recommends promoting that handoff type to a higher fidelity level. When drift stays below 0.05 consistently, it recommends demotion (saving tokens). Cooldown enforcement prevents thrashing: 7 days between promotions, 14 days between demotions. All recommendations require human approval.
The analyzer also detects patterns — recurring handoff types and their drift rates — and stores them in append-only JSONL for longitudinal analysis. Over time, the system learns exactly which handoff types need structure and which can safely use prose.
DACP's safety model has 8 mandatory requirements, each with a corresponding safety-critical test that must pass:
| ID | Requirement | Mechanism |
|---|---|---|
| SAFE-01 | Scripts are never auto-executed | Object.freeze on all script references |
| SAFE-02 | Fidelity changes bounded to ±1 | clampFidelityChange() enforced at the type level |
| SAFE-03 | No unprovenanced data | Script catalog rejects entries without valid provenance |
| SAFE-04 | Graceful degradation | tryLoadBundle returns null, never throws |
| SAFE-05 | Cooldowns enforced | 7-day promote, 14-day demote per pattern |
| SAFE-06 | Provenance chain enforcement | Scripts without valid source skill + version are rejected |
| SAFE-07 | Size limits enforced | 50KB data, 10KB script, 100KB total |
| SAFE-08 | Backward compatibility | Every bundle has a .msg fallback |
The v1.49 branch didn't stop at DACP. Six patch releases followed, each addressing real-world deployment issues:
Introduced a zone-based filesystem organization: projects/ for GSD projects, contrib/ for upstream/downstream collaboration, packs/ for educational content, and www/ for web staging. A .sc-config.json with Zod validation supports external project directories. 8 new CLI commands (sc project init, sc pack list, sc contrib status, etc.) make navigation instant. Path traversal prevention rejects .., /, and \ in project names before any filesystem operation touches disk.
A deep refactor following Linux open source conventions. Root directories reduced from 33 to 26. Added scdoc man pages, shell completions for bash/zsh/fish, a freedesktop.org .desktop entry, AppStream metadata, and a systemd user service unit for headless agent mode. Built Debian and RPM packaging infrastructure. Implemented XDG Base Directory utilities in both TypeScript and Rust, with relative-path rejection in both implementations. 103 files changed, zero regressions across 19,222 tests.
Fixed a C++ mutex crash in onnxruntime-node on macOS (static destruction order race), replaced Bash 4.3+ features with POSIX-compatible alternatives for Bash 3.2 (macOS ships ancient Bash and will never upgrade past GPLv2), and eliminated the natural NLP package — which pulled 300MB of unnecessary dependencies including PostgreSQL, MongoDB, and Redis drivers — replacing it with 250 lines of hand-rolled, zero-dependency TF-IDF and Naive Bayes implementations. The @huggingface/transformers package moved to optional with automatic fallback to heuristic scoring.
The Physical Infrastructure Engineering Pack (v1.48) demonstrates VTM at full scale. An 80-requirement milestone executed across 12 phases, 30 plans, and 52 commits in 3 sessions (~4.5 hours wall clock). It covers:
The Safety Warden runs in a non-bypassable architecture: every engineering calculation passes through safety before it reaches the user. PE (Professional Engineer) disclaimers are structural, not configurable. The warden was implemented as Wave 0 — before any domain skill — so there was never a moment during development when a calculation could exist without safety annotations.
The retrospective noted that 80-requirement milestones need 3 sessions, with context budget exhaustion predictable at about 25–30 requirements per session. Sessions split naturally at wave boundaries. The VTM-to-GSD pipeline was used for the seventh consecutive milestone, confirming the pattern's reliability.
Everything in this guide — VTM pipelines, wave planners, NASA SE phases, AGC simulators, DACP bundles, safety wardens — traces back to one principle:
Specialized, constrained building blocks composed intelligently will outperform general-purpose brute force. A skill that uses 2% of the context window is more powerful than a mega-prompt that uses 30%, because the small skill can compose with twenty others. A three-part handoff bundle with schema validation catches drift that prose instructions never would. A wave planner that maximizes parallelism finishes in half the time of sequential execution. None of these components is individually complex. Their power comes from composition.
The Amiga's custom chips — Agnus, Denise, Paula, Gary — were not individually powerful. They were precisely constrained for their domains, and they composed through a shared bus architecture. A 7 MHz system outperformed 16 MHz competitors because every transistor was doing work that mattered, and nothing fought itself.
Skill Creator applies this insight to AI agent orchestration. Skills are custom chips. Agents are chipset configurations. Teams are system architectures. The kernel is the scheduler. Copper lists are declarative workflow programs. The token budget is the memory bus bandwidth. And the AMIGA Principle — that the right constraints, applied to the right components, composed through the right interfaces, produce capabilities that exceed the sum of their parts — is the through-line that makes all of it work.
The system has grown from 6 capabilities in v1.0 to 139 in v1.33, with 19,222 tests as of v1.49.5. It has produced approximately 350,000 lines of code across 32 milestones, 278+ phases, and 740+ plans. Every one of those milestones was built using the same system it was building — the ultimate test of whether the tools actually work.