GSD
Going Further
Skills, MCP, and Multi-Repo Workflows
Three advanced GSD topics, each delivered through a complete walkthrough you can follow at the keyboard:
Skills & Slash Commands — the SKILL.md format, the 12-runtime install matrix, and how to inject custom domain knowledge into GSD’s subagents.
MCP Integration — connecting GSD to external tools via the Model Context Protocol, with full walkthroughs for Google Stitch UI generation and schema-aware planning via a Postgres MCP.
Multi-Repo Workflows — workstreams, workspaces, the STATE.md ownership rule, and patterns for teams running GSD across microservice fleets.
Tibsfox
For GSD v1.35.0
Audience: GSD users ready to customize, integrate, and scale
Released under CC BY-SA 4.0. Authored by Tibsfox. GSD itself is MIT-licensed — see gsd-build/get-shit-done.
Skills & Slash Commands
How GSD Actually Works — Skills Anatomy
Most GSD users never look under the hood. They install the tool, type
/gsd:new-project, and watch Claude Code do the rest. That
is exactly how it is supposed to feel. But the moment you want to bend
GSD to fit your team, your domain, or your personal taste, the black box
stops being a feature and starts being an obstacle.
This chapter opens the box. By the end of it you will know where GSD stores its skills on disk, what a single skill file looks like, how Claude Code decides which skills to activate, and why the installer ships two different layouts depending on which version of Claude Code you have. None of this requires source code spelunking — everything GSD installs is plain markdown, which means you can read it, diff it, copy it, and edit it with the same tools you already use.
Finding the Skills on Disk
On a Claude Code 2.1.88 or newer install, GSD writes its skills into
your per-user Claude configuration directory. On Linux and macOS that
lives at ~/.claude/skills/; on Windows it is
%USERPROFILE%\.claude\skills\. The GSD installer creates
one sub-directory per command, each prefixed with gsd-, and
each containing a single SKILL.md file:
~/.claude/skills/
├── gsd-new-project/
│ └── SKILL.md
├── gsd-plan-phase/
│ └── SKILL.md
├── gsd-execute-phase/
│ └── SKILL.md
├── gsd-verify-work/
│ └── SKILL.md
├── gsd-progress/
│ └── SKILL.md
├── gsd-help/
│ └── SKILL.md
└── ... (30+ more)
Go ahead and navigate there now. Run
ls ~/.claude/skills/ on your own machine. You should see a
long list of gsd-* directories plus any other skills Claude
Code has picked up from other tools or from your own hand-authored
experiments. This is not a sandbox — it is your real home directory, and
those markdown files are the exact instructions Claude reads when you
invoke a GSD command.
You can read any GSD skill file. They are plain markdown — no
compilation, no opaque binaries, no generated code. If you are ever
unsure what a /gsd:* command is actually going to
do, open its SKILL.md in your editor and read it. That is
the entire contract.
The SKILL.md Format
Every skill file Claude Code 2.1.88 and later consumes follows the
same shape: a YAML frontmatter block, a heading, and an instruction
body. The frontmatter carries two required keys — name and
description — and the body is markdown prose that becomes
the instruction set the model sees when the skill activates.
Here is the minimum viable skill file:
---
name: gsd-progress
description: "Show GSD project progress and current phase status"
---
# GSD Progress Report
When the user asks about the current state of the project, read
`.planning/STATE.md` and `.planning/ROADMAP.md`. Produce a compact
summary containing:
1. The current milestone (from ROADMAP.md).
2. The active phase (from STATE.md).
3. The most recent three completed work items.
4. Any open blockers recorded under the "Blockers" heading.
Render the output as a markdown table with two columns: "Item" and
"Status". Do not fabricate phase data --- if STATE.md is missing, say
so and suggest running `/gsd:new-project` to initialize it.
That is a complete, working skill. Drop that file into
~/.claude/skills/gsd-progress/SKILL.md, reopen Claude Code,
and typing “show me the GSD progress” will surface it as a
candidate skill. The name field is the stable identifier —
it also determines the slash-command alias (/gsd:progress
maps to gsd-progress) on runtimes that expose slash
commands. The description field is where the real magic
lives, and it deserves its own section.
How Trigger Descriptions Work
Claude Code does not load every skill on disk into every request. That would blow the context window inside of a dozen commands. Instead the runtime keeps a lightweight manifest of all installed skills — just their names and descriptions — and when a new user message arrives, it pattern matches the message against those descriptions to decide which skills to promote into the full context. Only promoted skills have their instruction body loaded.
The description is therefore the activation
contract. A vague description like "Help with GSD"
will match almost everything and almost nothing usefully. A concrete one
like "Show GSD project progress and current phase status"
will match phrases like “where are we on the project”,
“what phase are we in”, or “give me a status update”
without swamping unrelated requests.
When you author your own GSD-adjacent skills, invest time in the description line. Use action verbs, name the artifact you operate on, and include a few of the phrasings users actually say out loud. A good description is not a summary of what the skill does — it is a list of triggers that should fire it.
The frontmatter is strict YAML. Unquoted colons in the description will break the parser and silently disable the skill. When in doubt, wrap the description in double quotes. This is the single most common reason a hand-authored skill “doesn’t work”.
The Installer’s Role
GSD ships as an npm package. When you run
npx gsd-build install, the installer does three things in
sequence:
Detect the runtime. It looks at which agent CLIs are installed on the current machine — Claude Code, Codex, Cursor, Windsurf, and so on — and also at the version of Claude Code it finds. The 2.1.88 cutoff matters: earlier versions do not understand the skills directory layout at all.
Read its source skill definitions. GSD carries its command definitions in a single canonical form inside the npm package. These are the master copies.
Transform and write. For each detected runtime, the installer rewrites the master skill into whatever format that runtime actually consumes, then drops the result into the right directory. One master, many derivatives.
This architecture is why the same
/gsd:plan-phase command behaves identically whether you are
driving Claude Code, Codex, or Cursor — the installer is doing the
translation work up front so the user-facing surface is uniform.
Skills versus Legacy Slash Commands
Before Claude Code 2.1.88, Claude Code had no notion of skills. The
only way to ship reusable instructions was as a slash command:
a markdown file living at ~/.claude/commands/gsd/[name].md
that the user invoked by literally typing its name. There was no
description field, no trigger matching, no auto-activation — the user
had to know the command existed and type it explicitly.
GSD still supports this old world. If the installer detects a pre-2.1.88 Claude Code install, it writes legacy slash commands instead of skills:
~/.claude/commands/gsd/
├── new-project.md
├── plan-phase.md
├── execute-phase.md
├── verify-work.md
└── ...
You interact with both layouts the same way from the user side —
/gsd:plan-phase works whether it resolves to a skill or a
legacy command — but under the hood the runtime treats them differently.
Skills are auto-activating and context-aware; legacy commands only fire
on an explicit invocation. If you upgrade Claude Code from an older
version, run npx gsd-build install again and the installer
will migrate you forward automatically, leaving the legacy directory in
place so nothing breaks mid-session.
What You Can Do With This Knowledge
Once you internalize that GSD skills are just markdown on disk, a
long list of previously scary tasks becomes trivial. You can diff two
installs to see what changed between GSD releases. You can keep a local
patch directory and re-apply your own edits after an upgrade. You can
fork a skill, rename it to gsd-plan-phase-health, tailor it
to your domain, and have both the generic and the specialized version
coexist in the same directory. You can hand-write skills that call GSD
primitives without waiting for upstream to ship them. Everything is
text, everything is readable, and nothing is hidden.
The rest of this part walks through each of those capabilities in detail, starting with the compatibility matrix that determines where your skills actually land.
Skills Across Runtimes
GSD is not a Claude Code exclusive. The installer knows about a dozen different agent runtimes, each with its own directory layout, its own file format, and its own opinions about how instructions should be delivered to the model. This chapter is the reference you will reach for when a GSD command that works on one machine appears to be missing on another. The short answer is almost always “it installed fine, but it lives somewhere else on this runtime.”
The Compatibility Matrix
The table below lists every runtime the GSD installer currently supports, the format it emits for that runtime, the directory where the output lands, and the command you can run to confirm the install succeeded.
| Runtime | Install Format | Directory (Global) | Verify Command |
|---|---|---|---|
| Runtime | Install Format | Directory (Global) | Verify Command |
| Claude Code 2.1.88+ | skills/gsd-*/SKILL.md |
~/.claude/skills/ |
/gsd:help |
| Claude Code (legacy) | commands/gsd/*.md |
~/.claude/commands/ |
/gsd:help |
| Codex | skills/gsd-*/SKILL.md |
~/.codex/skills/ |
$gsd-help |
| Copilot | prompts + agents | ~/.github/ |
/gsd:help |
| Cursor | skills/gsd-*/SKILL.md |
~/.cursor/ |
/gsd:help |
| Windsurf | markdown transform | ~/.windsurf/ |
/gsd:help |
| OpenCode | config file | ~/.config/opencode/ |
/gsd-help |
| Gemini CLI | skills | ~/.gemini/ |
/gsd:help |
| Antigravity | skills | ~/.gemini/antigravity/ or
./.agent/ |
/gsd:help |
| Cline | .clinerules |
~/.cline/ |
rules auto-loaded |
| Augment | skills transform | ~/.augment/ |
/gsd:help |
| Qwen Code | skills (open standard) | ~/.qwen/skills/ |
/gsd:help |
Twelve runtimes, twelve directory layouts. Four of them (Claude Code 2.1.88 and later, Codex, Cursor, and Qwen Code) converge on the same skills-with- SKILL.md shape, which is the format this book treats as canonical. The other eight each require the installer to do some amount of translation work, and understanding those translations is the key to debugging any cross-runtime weirdness you run into.
How the Installer Transforms
The installer maintains a single internal representation for each GSD command — call it the “source form.” When a runtime requires a different format, the installer applies a translation pass. Here are the three most important translations.
Skills into Prompts (Copilot)
GitHub Copilot does not have a skills directory. It has a
prompts/ directory for standalone prompt files and an
agents/ directory for role definitions, and the two work
together. The installer splits each GSD skill in half: the instruction
body becomes a prompt file under ~/.github/prompts/gsd/,
and the trigger metadata plus any agent role information becomes an
agent definition under ~/.github/agents/. Invoking
/gsd:plan-phase in Copilot fires both sides
simultaneously.
Skills into Rules (Cline)
Cline takes the opposite approach. It has no concept of slash
commands at all — instead, it reads a .clinerules file and
treats the entire contents as a persistent instruction prefix injected
into every request. The installer concatenates every GSD skill body into
a single .clinerules file and drops it at
~/.cline/.clinerules. You will not find a
/gsd:plan-phase slash command in Cline because there are no
slash commands to find; the rules simply load automatically and Cline
recognizes the GSD phrasings when you use them conversationally.
Skills into TOML + Markdown (Codex)
Codex uses a paired-file convention: a .toml file
declares the skill metadata, and a matching .md file
alongside it carries the instruction body. The installer emits both
halves. If you open ~/.codex/skills/gsd-plan-phase/ you
will see a skill.toml with the name and description fields,
plus a SKILL.md with the prose. Codex also uses a different
invocation prefix: $gsd-help instead of
/gsd:help, which is worth memorizing if you move between
machines.
Runtime-Specific Gotchas
Even when you know the translation rules, a few runtimes have quirks worth calling out explicitly.
Cline Has No Slash Commands
If a new Cline user complains that /gsd:plan-phase
“isn’t working,” they have usually not done anything wrong — Cline
literally does not support slash commands. The fix is not to reinstall.
The fix is to describe what you want in natural language (“plan the
next phase”) and let the injected .clinerules do its
job.
Copilot Needs Both Halves
The prompt and agent files Copilot writes work together, and if only
one half lands (for example, because the installer was interrupted or
because ~/.github/agents/ already contained a conflicting
file), the GSD command will misbehave in subtle ways. Re-running
npx gsd-build install –force will overwrite any
half-written state and is the standard fix.
Codex Uses TOML Pairs
If you hand-edit a Codex skill, remember that the metadata lives in
the .toml file, not in frontmatter. Editing the
SKILL.md body to change the description field
will have no effect — Codex never reads frontmatter. Edit the
.toml.
Antigravity Has Two Install Dirs
Antigravity supports both a global install at
~/.gemini/antigravity/ and a project-local install at
./.agent/. The installer picks global by default but will
honor the –local flag to install into the current project.
This is the only runtime where GSD supports per-project scoped command
installs at the runtime layer; for every other runtime, per-project
customization happens through the agent_skills mechanism
described in the next chapter.
Non-Claude Model Profiles
Most GSD subagents (the executor, planner, researcher, and verifier roles) are built on Claude Code’s subagent spawn primitive, which defaults to whichever Anthropic model the parent session is running. That is usually what you want — consistency across the session — but it is a footgun the first time you run GSD inside an OpenRouter-fronted Claude Code or on a locally hosted model.
The problem: even when the parent session is routed through OpenRouter, a spawned subagent may try to call Anthropic directly unless you explicitly tell it not to. You end up paying OpenRouter for the parent and Anthropic for every subagent spawn simultaneously.
GSD ships an inherit profile for exactly this case. When
you select the inherit profile (via
/gsd:set-profile inherit or by editing
.planning/config.json), every spawned subagent will reuse
the parent session’s model configuration instead of defaulting to
Anthropic. If the parent is on OpenRouter, the subagents are on
OpenRouter. If the parent is on a local Ollama model, the subagents are
too.
If GSD subagents call Anthropic models and you are paying through
OpenRouter, switch to the inherit profile or you will see
double-billing. Run /gsd:set-profile inherit once per
project.
Verifying an Install
Regardless of runtime, the fastest way to confirm GSD is actually
installed is to run the /gsd:help command (or its
runtime-specific equivalent from the matrix). It is a skill like any
other, and if it responds with the GSD command listing, everything else
is wired up correctly. If it does not respond, the first diagnostic step
is to ls the install directory for your runtime and confirm
the gsd-* entries exist. Ninety percent of install problems
are really “I am on a runtime I did not expect to be on” problems, and
the matrix is there to cut that debugging time down to seconds.
Agent Skills Injection
So far everything in this Part has been about global skills
— the files GSD installs into your per-user directory that apply to
every project you work on. That is the right default for the command
layer. But GSD also has a sharper tool for per-project customization,
one that does not require touching the global directory at all:
agent_skills.
agent_skills is the mechanism by which a single project
teaches its GSD subagents things about its own domain. Those teachings
live inside the project, travel with it through version control, and
apply only while you are working in that project. No other project sees
them. No global state gets polluted.
The Concept
When GSD spawns a subagent — an executor running a phase, a planner drafting the next set of work items, or a researcher collecting domain information — it builds the subagent’s system prompt from a stack of pieces: the subagent’s role definition, the current project state, the task at hand, and optionally, one or more injected skill blocks drawn from local project directories.
Those injected blocks arrive in the subagent’s prompt as an
<agent_skills> section. From the subagent’s point of
view, they are just more instructions — indistinguishable from the
built-in role definition except that they came from a project-local
directory rather than from the GSD source tree. This is pure prompt
augmentation. There is no plugin system, no sandbox, no new execution
path. The injected content simply becomes part of what the subagent
sees.
That simplicity is the feature. You can teach a GSD researcher about FDA endpoints the same way you would teach a human researcher about them: by handing over a short written brief. The researcher reads the brief, integrates it into its working knowledge, and uses it for the duration of the task.
Configuration Syntax
agent_skills is configured in the project’s
.planning/config.json file. The key is
agent_skills, and its value is an object mapping subagent
type names to arrays of directory paths. Each path is resolved relative
to the project root.
{
"agent_skills": {
"executor": ["./skills/domain-knowledge/"],
"planner": ["./skills/planning-rules/"],
"researcher": ["./skills/research-sources/"]
}
}When a subagent of a given type spawns, the installer walks the
directories listed under that type, reads every .md file it
finds, concatenates the contents, and wraps the result in an
<agent_skills> block inside the subagent’s prompt.
Files are read in lexicographic order, so prefixing names with
01-, 02-, and so on gives you explicit control
over the final ordering when order matters.
Which Agent Types Accept Skills
Not every subagent type honors agent_skills. As of the
current release the supported types are:
executor — the phase-running worker. Injected skills here shape how code gets written, which conventions get followed, and which libraries get preferred. Good for “our codebase uses X pattern” style rules.
planner — the task decomposition agent. Injected skills here shape how phases get broken into work items and what kinds of safety rails the planner enforces. Good for “never do X without Y” style rules.
researcher — the information-gathering agent. Injected skills here shape which sources get consulted and what domain vocabulary gets used in the resulting
RESEARCH.md. Good for teaching the researcher about APIs, standards, databases, and terminology it would not otherwise know.
The verifier, documenter, and other specialized subagent types
currently use a fixed instruction set and do not read
agent_skills. That may change in a future release; for now,
if you need to customize verification behavior, you do it by editing the
global skill file directly.
Writing a Skill Directory
A skill directory is just a directory. Create it anywhere under your
project root (convention is ./skills/[name]/), drop one or
more .md files in it, and list the directory in
agent_skills. That is the whole contract.
The markdown files inside do not need frontmatter. They do not need a
name or description field. They just need to
contain the instructions you want the subagent to read. You can write
them in whatever style feels natural — prose, bullet lists, labeled
sections, annotated code examples — and the subagent will parse them the
same way it parses any other instruction block.
The reason naming is descriptive (rather than formal) is that every
.md file’s content gets concatenated together at spawn
time. The subagent does not see file boundaries; it sees one long block
of text. File names serve only two purposes: ordering (lexicographic)
and readability for you when you come back to the directory in six
months.
How Injection Affects Behavior
The injected <agent_skills> block lives inside the
subagent’s prompt for the entire duration of its task. That means:
The instructions persist across every tool call the subagent makes. A researcher that spawns a web search, then a file read, then another web search still has the FDA brief visible through all three calls.
The instructions are visible to the model at every decision point. When the model is picking which URL to fetch or which section to write first, it is weighing those choices against the injected skill content.
The instructions do not persist beyond the task. Once the subagent exits, the injected block goes with it. The next subagent spawn rebuilds the prompt from scratch, picking up any edits you made to the skill files in between.
That last point is important: you can edit a skill file mid-session and the next subagent spawn picks up the change immediately. There is no rebuild step, no cache to invalidate, no installer to re-run.
A Minimal Example
Suppose you are building a product that must never merge to
main without first running an integration test. You want
the GSD planner to refuse to generate a plan that skips this step. Here
is the full contents of
./skills/planning-rules/no-untested-merges.md:
# Merge Safety Rule
Never produce a plan whose final step is a merge to `main`, `master`,
or any protected branch unless the plan also contains an explicit
integration test step that runs BEFORE the merge and must pass.
If a user asks for a plan that would merge without testing, respond
with a plan that adds the missing test step and a one-line note
explaining why.
Applies to: all merge-containing phases, all branches matching
`main`, `master`, `release/*`, or `prod`.
Six lines of instruction. That is enough. With this file in place and
planner pointing at ./skills/planning-rules/
in agent_skills, every subsequent
/gsd:plan-phase invocation in this project will generate
plans that include a test step before any merge, and will add a short
justification when the user’s request would have skipped one.
The planner did not need to be recompiled. The GSD installer did not need to be rerun. You wrote six lines of markdown, saved the file, and the next planner spawn behaved differently. That is the entire feature.
agent_skills are project-local. They are defined in
.planning/config.json, they reference paths under the
project root, and they do not affect any other project on the same
machine. Commit the skill directories along with your code so the next
developer who clones the repo gets the same behavior automatically.
When to Reach for This Tool
agent_skills is the right tool whenever your
customization need has three properties: it is specific to one project,
it is worth writing down, and it would benefit from being seen by a
particular subagent role on every spawn. Domain vocabulary, compliance
rules, in-house API conventions, preferred libraries, house coding
standards, data-handling constraints — all of these fit naturally.
It is the wrong tool for global rules (those belong in a hand-authored global skill), for one-off asks (those belong in the prompt itself), and for rules the whole team has not yet agreed on (those belong in a discussion, not a config file). Within its sweet spot, though, it is the single most powerful lever GSD gives you for shaping subagent behavior — and the next chapter walks through a complete, end-to-end use of it.
Scenario — Custom Research Skill
You are three weeks into building a small health-tech application — a patient intake form that needs to map its fields cleanly to the FHIR Patient resource, check drug interactions against an authoritative source, and keep one eye on the 510(k) clearance pathway in case the product ever grows into a regulated medical device. You are using GSD to manage the work. Planning, execution, and verification all run smoothly. But the research phase keeps coming up short.
Here is what you want to fix, and how agent_skills fixes
it.
The problem.
You ran /gsd:research-phase yesterday on the topic
“data sources for drug interaction checking.” The researcher
subagent did its job: it spawned, ran a batch of web searches, read a
handful of Wikipedia pages and generic developer blogs, and wrote a tidy
RESEARCH.md file into .planning/research/.
Tidy, but wrong for your use case.
The output was full of generic phrases like “drug interaction
APIs such as those offered by various healthcare data vendors,”
hand-waving references to “the standard medical data format,”
and not a single mention of the FDA’s openFDA Drug Event API, the RxNorm
concept hierarchy, or the DrugBank identifier system. The section on
data formats talked about JSON generically but never named FHIR, never
mentioned the MedicationStatement resource, and never
hinted that a serious health-tech product would need to handle HL7
interop.
This is the researcher working exactly as designed. The problem is not that it is bad at research; the problem is that it has no reason to prefer FDA over DrugBank over a random consumer health blog, because nobody told it which sources are authoritative in your domain. You are going to tell it now.
Step 1: Create the skill directory.
Start by carving out a home for the domain brief. GSD convention is
to put project-local skill directories under ./skills/, so
make one there with a descriptive name:
mkdir -p ./skills/health-research/
touch ./skills/health-research/SKILL.mdOpen that new SKILL.md in your editor and write a real
brief. This is the file you will hand to every future researcher spawn,
so make it concrete. Name the endpoints, name the resources, name the
regulations. Here is the full contents of the file we used for this
project:
# Health-Tech Research Brief
When researching any topic related to drug data, patient records,
clinical workflows, or medical devices, prefer the following
authoritative sources over generic web results. Always cite them
by name in the final RESEARCH.md.
## Drug data
- FDA openFDA Drug Event API:
https://api.fda.gov/drug/event.json
Use for adverse event lookups and label data.
- RxNorm (NLM):
https://rxnav.nlm.nih.gov/REST/
Use for normalized drug name resolution.
- DrugBank (academic tier):
https://go.drugbank.com/
Use for interaction data; cite the DrugBank ID.
## Clinical trials and evidence
- ClinicalTrials.gov search API:
https://clinicaltrials.gov/api/
Use for active and completed trials, inclusion criteria,
and results summaries.
## Data model
- FHIR R4 resource definitions:
https://hl7.org/fhir/R4/resourcelist.html
When modeling patient data, reference the exact FHIR resource
and field names (e.g., Patient.identifier, Observation.code,
MedicationStatement.medicationCodeableConcept).
## Regulatory
- HIPAA Privacy Rule: treat all PHI fields as requiring encryption
at rest and in transit. Never store unmasked PHI in logs or
telemetry.
- 510(k) Premarket Notification: if a feature could classify as
a medical device, flag it and link to
https://www.fda.gov/medical-devices/premarket-submissions/
premarket-notification-510k
Notice what this file is not. It is not a summary of everything there is to know about health tech. It is not a tutorial. It is not a policy document. It is a short, dense list of “here are the sources to use and the vocabulary to use when writing up your findings.” Twenty-some lines is plenty. The researcher is already capable of doing research; you are just handing it a better rolodex.
Step 2: Configure agent_skills.
Open .planning/config.json and add a
researcher entry to the agent_skills object.
If the object does not yet exist, create it. Here is the relevant
snippet with the new entry in place:
{
"project_name": "health-intake",
"profile": "inherit",
"agent_skills": {
"researcher": ["./skills/health-research/"]
}
}Save the file. That is the entire configuration step. No installer
rerun, no restart, no validation pass. The GSD runtime reads
.planning/config.json on every subagent spawn, so the next
time a researcher starts, it will pick up the new skill directory.
Step 3: Run /gsd:research-phase.
Re-run the same research phase you ran yesterday. This time the
planner subagent reads your request, kicks off a researcher spawn, and
the GSD runtime assembles the researcher’s prompt: role definition, task
description, project state, and now — because you wired it up — an
<agent_skills> block containing your health-tech
brief.
/gsd:research-phase "data sources for drug interaction checking"What happens inside the spawn: the researcher reads the brief as part
of its initial context, sees the named endpoints (openFDA, RxNorm,
DrugBank), and queues those searches first. When it hits web results for
generic health blogs, it now has a yardstick to measure them against,
and it deprioritizes them. When it comes time to write up findings in
RESEARCH.md, the vocabulary from the brief —
MedicationStatement, Observation.code, HIPAA,
510(k) — shows up in the prose because the researcher saw
those terms as part of its working instructions.
Step 4: Verify RESEARCH.md.
Open .planning/research/RESEARCH.md and confirm the
domain knowledge landed. Here is an excerpt from the file our example
produced, covering the “Drug data sources” and “Data format”
sections:
## Drug data sources
Primary: FDA openFDA Drug Event API
(https://api.fda.gov/drug/event.json). Use `search=` query
parameter for adverse event lookups; rate limit 240 req/min
unauthenticated.
Normalization: RxNorm via the NLM RxNav REST endpoint
(https://rxnav.nlm.nih.gov/REST/). Use `rxcui` lookup to
collapse brand names to concept IDs before interaction checks.
Interaction data: DrugBank academic tier. Cite DrugBank ID
(format DBnnnnn) in all interaction records.
## FHIR Patient resource shape
- Patient.identifier (array of Identifier)
- Patient.name (array of HumanName)
- Patient.telecom (array of ContactPoint, PHI)
- Patient.birthDate (date)
- Patient.address (array of Address, PHI)
PHI fields MUST be encrypted at rest per HIPAA Privacy Rule.
Twelve lines, and every one of them is grounded in something the brief told the researcher to care about. Endpoints are cited with real URLs. Resources are named by their exact FHIR field paths. The HIPAA note is present because the brief flagged it. Compare this to yesterday’s output and the difference is not subtle — this reads like something a health-tech engineer would actually hand to a coworker, not a generic tour of the internet.
Step 5: Iterate.
You will not get the brief perfect on the first try. Maybe the
researcher missed the SNOMED CT code system, or over-cited DrugBank when
RxNorm would have been more appropriate, or under-cited the 510(k)
clearance pathway because your wording was too hedged. The fix is
straightforward: open ./skills/health-research/SKILL.md
again, sharpen the instructions, save the file, and re-run
/gsd:research-phase. The next researcher spawn picks up the
edits without any rebuild step, because agent_skills reads
live from disk on every spawn.
If you find that a single monolithic file is getting unwieldy, split
it into multiple markdown files inside the same directory. Name them
01-sources.md, 02-fhir.md,
03-regulatory.md, and GSD will concatenate them in
lexicographic order. The subagent still sees one continuous block, but
you get to reason about the pieces separately.
When you are happy with the brief, commit it. The
./skills/health-research/ directory and the updated
.planning/config.json both belong in version control, and
every developer who clones the repo will get the same researcher
behavior automatically — no setup steps, no onboarding friction, no
tribal knowledge about “the way we do research on this project.” The
knowledge is written down, and it travels with the code.
Skill files are live. Edit and re-run — there is no rebuild step. If a GSD subagent is missing context, the fastest path to a fix is almost always “write it into the skill file and spawn again.”
You have now seen the full shape of GSD’s skills system: global
skills that live in your per-user directory and apply everywhere, legacy
slash commands for older Claude Code versions, twelve runtime targets
that all converge on the same user experience, and project-local
agent_skills that let you teach GSD subagents anything your
domain needs them to know. Part II picks up from here and looks at
agents and teams — the mechanism by which GSD composes subagents into
larger, coordinated work units.
MCP Integration
MCP Fundamentals for GSD Users
If Part I was about teaching GSD to speak your project’s dialect, Part II is about teaching GSD to reach outside of itself. Most of the real work developers do involves tools: design tools, databases, ticket systems, cloud consoles, monitoring dashboards, that one custom script your team lead wrote three years ago that nobody fully understands. Claude Code can talk to all of them — but only if a bridge exists. That bridge is MCP.
What MCP Actually Is
MCP — the Model Context Protocol — is an open standard, published by Anthropic, for connecting AI assistants to external tools and data sources. At its heart it is a small, stable JSON-RPC interface. A program that implements MCP is called an MCP server. A program that connects to one is called an MCP client. Claude Code is a client. Almost everything else in this chapter is a server.
Each MCP server exposes a handful of tools. A tool has a
name, a JSON schema describing its arguments, and a function that runs
when Claude Code invokes it. During a conversation, Claude Code sends a
list-tools request to every connected server at startup, merges the
results into its own tool surface, and from that point on, calling an
MCP tool looks identical to calling a built-in one. The assistant does
not know or care whether read_file is implemented in Claude
Code itself or served by a filesystem MCP running on your laptop.
This is a much bigger deal than it sounds. Before MCP, every integration between an assistant and an outside service had to be hand-coded, vendor by vendor, with a bespoke protocol each time. MCP collapses that sprawl into one JSON-RPC shape that anyone can implement in a weekend. You write the server once, and every MCP-aware client — Claude Code, Claude Desktop, the various editor plugins — can use it without modification.
Why MCP Matters for GSD Specifically
Here is the part that surprises people: you don’t have to tell GSD anything about your MCP servers. It figures them out on its own.
As of GSD v1.30 (see CHANGELOG entry #1603), the executor and planner
subagents are explicitly instructed, in their system prompts, to inspect
the current set of available tools before drafting a plan or executing a
task, and to use any MCP tools they find as if they were first-class
capabilities. That was a one-line change to the subagent prompt
templates and it unlocked an enormous amount of behavior. If you connect
a Postgres MCP server to Claude Code, the planner now sees
postgres.query in its tool list and will naturally reach
for it when you ask it to plan a database migration. If you connect
Google Stitch’s MCP, the planner sees
stitch.generate_screen_from_text and will write plan steps
that call it directly.
You don’t edit a GSD config file to declare the tool. You don’t
hand-wire anything in .planning/. The subagents inherit
Claude Code’s entire tool surface, MCP and all, and they use what they
find.
GSD’s agents discover MCP tools automatically — that’s what v1.30’s #1603 fix added. Just configure the MCP server in Claude Code and GSD will pick it up on the next planning or execution run.
Transport Types: stdio and HTTP
MCP servers come in two flavors, distinguished by how the client talks to them.
stdio Transport
The most common shape is stdio: the client launches the
server as a local subprocess and exchanges JSON-RPC messages over the
child’s standard input and standard output streams. This is simple,
fast, and completely local — no network stack, no authentication, no
ports to open. Most of the MCP servers you will install for personal use
are stdio servers, invoked as npx or uvx
commands. The filesystem server, the GitHub server, the official
Postgres reference server, the Stitch proxy we’ll meet in Chapter 7: all
stdio.
The lifecycle is straightforward. When you start a Claude Code session, the client reads your MCP config, spawns each stdio server as a subprocess, speaks JSON-RPC to it over pipes, and keeps the subprocess alive for the duration of the session. When the session ends, the subprocess is killed. Restarting Claude Code restarts all your stdio servers from a clean slate.
HTTP Transport (Streamable HTTP)
The other shape is remote HTTP, now formally called Streamable HTTP in the MCP specification. Instead of spawning a subprocess, the client connects to an HTTP endpoint over the network and exchanges messages as HTTP requests with server-sent events for streaming. This is how you would connect to a shared team MCP server, a cloud service that exposes an MCP endpoint, or any tool that doesn’t want to run on your machine.
HTTP transport is more complicated in exchange for more power: you have to think about authentication (API keys in headers, bearer tokens, OAuth flows), endpoint availability, and the fact that your network is now in the critical path of every tool call. It is the right choice for production integrations and the wrong choice for personal utilities.
How Discovery Works
When Claude Code starts a session, it walks a short, ordered list of
configuration sources and merges the results. The project-scoped file
.mcp.json in the current working directory comes first; the
user-scoped file ~/.claude.json comes second. A server name
defined in the project file takes precedence over the same name in the
user file, which means you can ship a project-local override for a
server that you otherwise use everywhere.
For each server entry, Claude Code either launches a subprocess
(stdio) or opens a connection (HTTP), sends the MCP handshake, and asks
for the server’s tool listing. Those tools are added to the set of
things the assistant can invoke in the current conversation. GSD’s
subagents, when they are spawned by commands like
/gsd:plan-phase or /gsd:execute-phase, inherit
the same tool surface from the parent Claude Code process. There is no
second discovery step, no second config file, no second place for things
to go wrong.
The upshot is that an MCP server you add once is immediately available everywhere GSD does useful work: planning, execution, verification, review. The subagents don’t need to be taught about the new tool. They just find it sitting on the shelf.
A Short History and Why It Matters
MCP is young. The specification was first published in late 2024, the reference implementations appeared shortly after, and the ecosystem of third-party servers has been expanding roughly monthly since then. In practical terms this means three things. First, the protocol surface is still small enough to read in an afternoon — if you’re curious about what an MCP message looks like on the wire, the spec is not yet so sprawling that you need a guide to navigate it. Second, new servers are appearing constantly, and the server you want probably exists (or will exist by next month) even if it doesn’t show up on the first page of search results today. Third, the protocol is versioned and backwards-compatible in a way that means the config patterns in this guide are unlikely to break under you in the near term — when the spec evolves, it does so by adding new capabilities to the negotiation step rather than breaking old message shapes.
The reason any of this matters for a GSD user is that the protocol’s youth shows up in rough edges. Error messages from MCP servers are sometimes less polished than the equivalent error from a mature vendor SDK; documentation for less-common servers is sometimes scattered across a README, a changelog, and one Discord thread; and the first time you wire up a new server, expect to spend ten minutes figuring out which environment variable the server’s author chose to name the API key. None of this is fatal. All of it is the cost of being a year early. The payoff is that the glue code you write today will still work in three years, because the protocol itself is designed to outlive any one vendor’s implementation.
What MCP Is Not
A brief list of things MCP is not, because misconceptions in this space are expensive.
MCP is not a replacement for HTTP APIs. If you have a REST API that you love and that works, MCP does not make it faster or better; it wraps it. The value of MCP is not performance. It is standardization. An MCP server in front of your REST API means any MCP-aware client can use it without hand-coding the integration. That’s worth something if you have more than one client. It’s worth less if you only ever use one.
MCP is not an agent framework. The protocol says nothing about how the assistant decides which tool to call, how to chain tool calls together, how to handle errors, or how to plan multi-step work. That’s the assistant’s job. MCP just moves bytes between a client and a server. The intelligence lives on either side of the wire, not in the wire itself.
MCP is not a permissions system. Giving a tool to Claude Code via an MCP server grants the assistant whatever access the underlying credentials provide. If your Postgres MCP server connects with a superuser account, Claude Code gets superuser access to the database, and so does every subagent GSD spawns. Scope your credentials at the database, filesystem, or API level — that’s where permissions belong — and treat MCP as a transport layer that will faithfully carry whatever authority you give it.
MCP is not magic. The tools you install are the tools you get. If no one has written an MCP server for the thing you want to integrate with, you have two options: find a close-enough existing server (a generic HTTP fetch server, a shell-command runner, a scripting server) or write one yourself. Writing one yourself is surprisingly cheap — the reference TypeScript and Python SDKs make a minimum-viable server about fifty lines of code — but it is still work, and no amount of AI enthusiasm will conjure a server that doesn’t exist yet.
Configuring MCP Servers
Now that you know what MCP is and why GSD cares, let’s add one. There
are three ways to register a server with Claude Code: the
claude mcp CLI, a project-scoped .mcp.json
file, and the user-scoped ~/.claude.json file. All three
produce the same runtime behavior; they differ only in scope and in how
much typing you do.
Adding Servers via the CLI
The claude mcp add command is the fastest way to
register a server without hand-editing JSON. Two examples cover the two
transports:
# Register a local stdio server (the Stitch proxy, which we'll meet
# in Chapter 7). The "--" separator marks the end of claude's own
# flags and the start of the subprocess command line.
claude mcp add stitch --transport stdio -- npx @_davideast/stitch-mcp proxy
# Register a remote HTTP server at user scope (-s user means every
# project on this machine will see it).
claude mcp add my-api --transport http https://api.example.com/mcp -s userThe CLI writes the entry to the appropriate config file for you —
project or user scope, depending on the -s flag — and from
that point on, the server is indistinguishable from one you added by
editing JSON directly. Use the CLI for one-shot experimentation; use
JSON files when you want to commit the config alongside the rest of your
project or when you want finer control over env-var injection.
Project-Scoped Configuration: .mcp.json
A .mcp.json file in the root of your project declares
servers that are specific to this project and nowhere else. The shape is
dead simple:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/projects"]
}
}
}The top-level mcpServers object holds one entry per
server, keyed by a short name of your choosing (this name becomes the
prefix on tool calls — so filesystem.read_file would be the
fully qualified tool name in the above example). The
command is the binary to launch, and args is
the argv passed to it. That’s the minimum viable config.
User-Scoped Configuration: ~/.claude.json
The user-scoped file lives at ~/.claude.json and has the
same shape, just in a different place. Entries here apply to every
project you open in Claude Code, unless a project-local entry with the
same server name overrides it. Use the user-scope file for personal
utilities: your filesystem server pointing at ~/projects,
your GitHub server with your personal access token, whatever
daily-driver search MCP you’ve settled on.
Project vs User Scope: How to Decide
The rule of thumb is simple. If the tool is specific to this codebase — a database server configured with this app’s connection string, a Stitch project that holds this app’s UI, a staging API endpoint you only ever hit from this repo — use project scope. If the tool is a personal utility that you’d want in every session — filesystem, GitHub, your search engine of choice — use user scope. When in doubt, start at user scope and move it down to project scope only if it turns out to need project-specific credentials or paths.
A slightly more nuanced version of the same rule: ask yourself what would happen if a teammate cloned your repo and ran Claude Code in it. Would they need this server to reproduce your work? If yes — the database MCP, the Stitch MCP pointing at a shared project, the deployment API wrapper — it belongs in project scope, committed (minus credentials) so the next developer gets it for free. If no — your personal filesystem server, a search MCP pointing at your own notes, a weather server you installed for fun — it belongs in user scope where it stays with you and doesn’t clutter the shared config. The tiebreaker, when both feel true, is whether the tool holds state that is specific to this project. A Postgres MCP holding this app’s connection string clearly does. A GitHub MCP holding your personal access token clearly does not, even if you only ever use it in one repo at a time.
There’s a third scope worth knowing about, even though most users
will never touch it directly: the session-local scope created by
claude mcp add without either -s project or
-s user. This adds a server for the duration of the current
Claude Code session only, in-memory, with no file changes. It is useful
for truly one-shot experiments — you want to poke at a new server for
ten minutes without committing to it — and it vanishes when you exit the
session. Treat it as a try-before-you-buy mode.
Environment Variables in Config
Hardcoding an API key into .mcp.json is a loaded gun.
Even if you never commit the file, it still sits on your disk in
plaintext, readable by any process running as your user, and any casual
cat .mcp.json will print it to a terminal that might be
scrolled back or screen-shared or logged by an operator tool. The
correct pattern is env-var interpolation: store the secret in your shell
environment and reference it by name in the config.
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
}
}
}
}The values shown above are placeholders. Never
commit real API keys, OAuth tokens, or database connection strings to a
file that will be checked into version control. Use environment
variables (${VAR_NAME}) and add .mcp.json to
.gitignore whenever it carries credentials.
The env block passes environment variables into the
subprocess. Values of the form ${NAME} are expanded from
your own environment at the moment Claude Code spawns the server — so
the real token lives in your shell profile (or a secrets manager, or a
dotenv file that is itself gitignored), not in the config file. If the
variable is unset, the substitution leaves an empty string and the
server will almost always fail with an auth error; that’s the correct
failure mode.
Add .mcp.json to .gitignore whenever it
carries API keys, OAuth client secrets, or database connection strings —
even if you’re using env-var interpolation, because the shape of the
file itself can leak information about your infrastructure. The
user-scope file ~/.claude.json lives outside any repository
so it is safer for shared credentials by default, but you should still
treat it like an SSH key: tight file permissions, no copies in shared
drives, no pasting into chat tools. A credential that leaves your laptop
is a credential that has leaked, regardless of where it ended up.
One more footgun worth flagging: some editors and IDEs offer to
“helpfully” open .mcp.json in a JSON schema view that pings
a remote schema registry. Disable this for any config file that contains
internal endpoint URLs or server names you’d rather not publish to
someone else’s telemetry. The GSD guide team has seen this happen once.
Once is enough.
Scenario — Google Stitch + GSD
Google Stitch is an AI-powered UI generation product built on Gemini 2.5 Pro. You describe a screen in plain English (or hand it a reference image), and it produces production-ready frontend code in the framework of your choice — HTML/CSS, React, Vue, Angular, all with Tailwind variants. For developers who can ship backends in their sleep but whose last three side projects died because the UI looked like a 2003 wireframe, this is the exact tool you’ve been waiting for.
Stitch ships with an official MCP integration in the form of the
@_davideast/stitch-mcp npm package. The package provides
two things: an interactive init wizard that handles the
painful Google Cloud setup, and an MCP proxy server that your Claude
Code client connects to. The proxy is the recommended entry point, and
the rest of this scenario walks through why.
The Stitch MCP server exposes the following tools:
create_project, generate_screen_from_text,
get_screen, get_screen_code,
get_screen_image, build_site,
list_projects, and list_screens. You won’t
need to invoke any of them by hand — GSD’s planner and executor
subagents will find them in the tool list and wire them into the plan
for you once you’ve completed setup.
Step 1 — Admit that the problem is real.
You have a meal-planning app in your head. Onboarding screen, dashboard, recipe browser, meal calendar, shopping list: five screens total. You can write the backend this weekend, and you have written it in your head three times already. What has always stopped you is that the UI from each of those mental iterations looked like a homework assignment. You don’t want to learn Figma. You don’t want to be tied to one design tool for the rest of your career. You do want to ship. This is the problem Stitch is trying to solve, and the question this chapter answers is what happens when you hand that solved problem to GSD.
Step 2 — Set up a Google Cloud project and enable the Stitch API.
Open the Google Cloud Console (console.cloud.google.com), sign in with the Google account you want to use for Stitch billing, and either create a new project or pick an existing one from the project dropdown. From the API & Services section, search the product catalog for “Stitch” and click Enable on the product page. You’ll be asked to confirm or attach a billing account; API usage bills against that account at whatever rate Google currently publishes, so check the current pricing page before you enable it if you’re cost-sensitive.
Two notes about this step that will save you a headache. First, the Stitch API is not enabled by default on any project, including new ones — you always have to click Enable explicitly. Second, an active billing account is required; the free tier is metered, not permanent, and a project without billing will fail at the first tool call with an authentication-like error that isn’t actually an authentication problem. Make sure billing is attached.
Step 3 — Install stitch-mcp.
# Run the interactive initializer. It will walk you through
# gcloud auth, project selection, and config generation.
npx @_davideast/stitch-mcp initThe init wizard is the part you will love. It runs
gcloud auth application-default login for you if gcloud is
installed and you aren’t already signed in, shows you the list of
projects you have Stitch access to, lets you pick one, and then prints a
ready-to-paste .mcp.json snippet at the end. It also tells
you which environment variables it expects for the API-key path if you’d
rather go that route instead of OAuth.
Step 4 — Choose an auth method.
There are three ways to authenticate a Stitch MCP connection, and getting this decision right on day one is worth more than any other detail in this walkthrough. Here’s the comparison:
| Method | Setup | Token Lifetime | Best For |
|---|---|---|---|
| API Key | Set STITCH_API_KEY env
var |
Permanent | Solo developers, quick local testing |
| OAuth (gcloud) | gcloud auth application-default login |
1 hour, auto-refreshed by proxy | Teams, CI/CD, daily driver |
| Remote MCP (Google native) | claude mcp add stitch –transport http |
1 hour, MANUAL refresh | Quick experiments only |
The numbers in the middle column are load-bearing. Google OAuth
application-default tokens have a one-hour lifetime —
this is a Google-wide policy, not a Stitch-specific limit, and no amount
of configuration will extend it. What differs between rows 2 and 3 is
who handles the refresh. The local proxy from
@_davideast/stitch-mcp watches the token’s expiration,
re-runs the refresh flow in the background when the token gets close to
expiring, and transparently keeps your session alive for as long as
Claude Code is running. A manually-added remote MCP does not do this —
the token expires in the middle of a working session, your next
generate_screen call fails with a 401, and you have to
manually re-authenticate before you can continue.
For anything resembling daily work, the OAuth-via-proxy path is the correct choice. The auto-refresh is the single biggest quality-of-life feature in the whole integration. The API-key path is legitimately fine for solo developers who want to get started quickly, but every committed key is a potential leak and every rotation is manual; budget for that if you go that way. The remote-MCP path is genuinely only useful for a quick experiment: spin it up, poke at one screen, tear it down. Anything longer and you’ll spend more time re-authenticating than building.
Step 5 — Add MCP config to
.mcp.json.
Paste the config block from the init wizard into a
.mcp.json file at the root of your project. It will look
like this (the proxy path, which is what you want):
{
"mcpServers": {
"stitch": {
"command": "npx",
"args": ["@_davideast/stitch-mcp", "proxy"]
}
}
}The values shown above are placeholders. Never
commit real API keys, OAuth tokens, or database connection strings to a
file that will be checked into version control. Use environment
variables (${VAR_NAME}) and add .mcp.json to
.gitignore whenever it carries credentials.
If you decided during Step 4 to go the API-key route instead, the
config shape is slightly different — you supply the key through the
env block rather than relying on ambient gcloud
credentials:
{
"mcpServers": {
"stitch": {
"command": "npx",
"args": ["@_davideast/stitch-mcp", "proxy"],
"env": {
"STITCH_API_KEY": "${STITCH_API_KEY}"
}
}
}
}The values shown above are placeholders. Never
commit real API keys, OAuth tokens, or database connection strings to a
file that will be checked into version control. Use environment
variables (${VAR_NAME}) and add .mcp.json to
.gitignore whenever it carries credentials.
Reminder from Chapter 6: any .mcp.json that carries
credentials — API keys in env, connection strings, bearer
tokens — belongs in .gitignore. The env-var interpolation
pattern (${STITCH_API_KEY}) pushes the actual secret out of
the committed file and into your shell environment, which is strictly
better than a literal, but doesn’t replace gitignoring the file entirely
for credential-bearing configs.
Step 6 — /gsd:new-project.
Now the fun part. Fire up Claude Code in your empty project directory and run:
/gsd:new-project a meal-planning app with onboarding, dashboard,
recipe browser, meal calendar, and shopping list --- 5 screens
total. Use Stitch for UI generation with React and Tailwind.GSD’s new-project command writes .planning/PROJECT.md
with a one-paragraph description, a rough feature list, and a detected
tech stack. This is also the moment it kicks off the discuss
phase, which is the structured conversation where design decisions get
locked in before any code is written. At this point GSD does not yet
know it will be using Stitch — it has inferred “React” and “Tailwind”
from your prompt, but it hasn’t yet seen the Stitch tools in its tool
list because the discuss phase hasn’t run. That’s fine. The next step is
where the MCP layer starts to earn its keep.
Step 7 — /gsd:discuss-phase 1.
The discuss phase is where you lock in decisions that the planner
will treat as constraints. This is where you commit to React + Tailwind
as the framework (so the planner doesn’t hedge later), where you decide
whether to let Stitch pick a color palette or supply one yourself, and
where you give the planner the Stitch project ID it will need to call
the MCP tools. Here’s a fragment of what you might see in the resulting
DISCUSS.md:
## Design Decisions --- Phase 1
Framework: React 18 + Tailwind CSS 3
UI tool: Google Stitch (via stitch-mcp proxy)
Stitch project ID: YOUR_PROJECT_ID
Color palette: defer to Stitch (will regenerate if needed)
Target screens: onboarding, dashboard, recipes, calendar, shopping
Rationale: Stitch generates React + Tailwind natively, so no
transpilation step. Letting Stitch pick the palette on the first
pass gives us a starting aesthetic to react to --- easier than
designing from a blank page.The Stitch project ID is the key piece. Without it the planner can
reach the Stitch MCP’s tool list but won’t know which project to operate
in; with it, every subsequent tool call will target the correct project
automatically. Grab the ID from Stitch’s project listing (or from
stitch.list_projects if you’d rather ask the MCP server
itself) and paste it into the discuss phase output.
Step 8 — /gsd:plan-phase 1.
With the discuss phase committed, run plan-phase. The planner
subagent spawns with access to Claude Code’s full tool surface — which
now includes every tool the Stitch MCP exposes. It reads
DISCUSS.md, sees that Stitch is the chosen UI tool, sees
stitch.generate_screen_from_text sitting in its tool list,
and writes plan steps that call it directly.
Here’s the flavor of what the resulting PLAN.md will
look like once the planner is done:
Task 3: Generate onboarding screen via Stitch
Tool: stitch.generate_screen_from_text
Inputs: {
project_id: YOUR_PROJECT_ID,
prompt: "Onboarding screen for a meal planning app. Three
steps: welcome, dietary preferences, household size.
Warm, approachable tone. Tailwind utility classes.",
framework: "react-tailwind"
}
Output: HTML/CSS in src/screens/Onboarding.tsx
Depends on: Task 1 (project scaffold), Task 2 (stitch auth check)
The planner generates one of these blocks per screen — five screens,
five Stitch calls — plus the usual scaffolding, routing, and integration
tasks. Notice that the planner is writing tool-specific inputs with real
Stitch-compatible field names (project_id,
framework, prompt). It knows to do that
because the MCP tool’s schema tells it what arguments the tool accepts.
Schemas beat prose every time.
Step 9 — /gsd:execute-phase 1 and
/gsd:ui-review 1.
Execute-phase is where the rubber meets the road. The executor subagent walks through the plan task by task, calling MCP tools for the tasks that declare them, and writing real files on disk as the tools return. You’ll see progress output along the lines of:
[Task 3/14] Generate onboarding screen via Stitch
-> Calling stitch.generate_screen_from_text
-> Received screen_id = scr_a8f20b1e
-> Calling stitch.get_screen_code (framework=react-tailwind)
-> Wrote src/screens/Onboarding.tsx (2.4 KB)
-> Calling stitch.get_screen_image (for reference asset)
-> Wrote docs/screens/onboarding.png (47 KB)
[Task 3/14] COMPLETE
Each screen produces a real file in src/screens/ and a
reference PNG in docs/screens/. The reference PNGs are not
strictly necessary but they’re worth keeping: they’re what Stitch’s
renderer produced for the same prompt, and later on they make it much
easier to reason about visual drift when you iterate on the copy or the
palette.
Once the executor has run through all five screens, run
/gsd:ui-review 1. The ui-review command audits the
generated UI against GSD’s six-pillar UI standards: accessibility,
usability, consistency, information density, responsive behavior, and
visual polish. It will flag missing alt text, contrast failures,
keyboard traps, and the other quiet ways AI-generated UI can be
technically correct and practically broken. Stitch’s output is generally
good, but it is not infallible, and ui-review is where you find out.
The values shown above are placeholders. Never
commit real API keys, OAuth tokens, or database connection strings to a
file that will be checked into version control. Use environment
variables (${VAR_NAME}) and add .mcp.json to
.gitignore whenever it carries credentials.
Stitch can generate in at least six frameworks (HTML/CSS, React, Vue,
Angular, with or without Tailwind). Tell GSD which framework you want
during the discuss phase, not the execute phase. The
planner will then write framework-specific task inputs that the executor
can hand directly to stitch.get_screen_code without a
conversion step. Waiting until execute-time means the planner has
already committed to a structure it now has to rewrite.
Scenario — Database MCP + GSD Planning
GSD’s planner is smart but not clairvoyant. Ask it to “add a
user_preferences table” in a project where it has no
visibility into the database, and it will produce a plausible migration
against a plausible schema — not your schema. The columns it picks will
be reasonable guesses, the foreign key names will look like what a
textbook would suggest, and the migration will fail the moment it meets
the real database. Worse, a junior reviewer glancing at the plan might
not notice.
A Postgres MCP server fixes this by giving the planner read-only
introspection into the actual schema. With the MCP installed, the
planner issues real SELECT statements against
information_schema before it drafts the migration, and the
resulting plan references the columns that actually exist. The rest of
this chapter walks through the setup.
Step 1 — Install a Postgres MCP server.
The Anthropic-maintained reference server
@modelcontextprotocol/server-postgres is the safest
starting point. It is deliberately read-only: the only
tool it exposes is a query function that runs against the
database, and the server wraps every query in a read-only transaction so
even a DELETE inside the query text cannot mutate data.
That’s a feature, not a limitation. For planning work, read-only is all
you need and all you want.
# No install step is needed -- npx fetches the package on demand
# when Claude Code launches the subprocess. But you can pre-cache
# it to shave a few seconds off the first session:
npx -y @modelcontextprotocol/server-postgres --versionStep 2 — Configure with a connection string.
Add the server to your .mcp.json with a connection
string supplied as an environment variable. Never hardcode the string
into the file — database URLs contain credentials by definition, and a
credential in a config file is a credential one git add
away from leaving the building.
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"DATABASE_URL": "${DATABASE_URL}"
}
}
}
}The values shown above are placeholders. Never
commit real API keys, OAuth tokens, or database connection strings to a
file that will be checked into version control. Use environment
variables (${VAR_NAME}) and add .mcp.json to
.gitignore whenever it carries credentials.
Export DATABASE_URL in your shell before starting Claude
Code (from a dotenv file, your secrets manager, or a line in your shell
profile — whichever you prefer). The substitution happens when Claude
Code spawns the MCP subprocess, so the value needs to be set in the
shell that launches Claude Code, not just in Claude Code’s own session.
And yes: this .mcp.json belongs in .gitignore
for exactly the reasons Chapter 6 covered.
Step 3 — /gsd:plan-phase with schema
visibility.
With the Postgres MCP running, start a plan-phase command that will require schema knowledge. For example:
/gsd:plan-phase 2 add a user_preferences table with notification
toggles, theme selection, and default meal calendar viewThe planner subagent sees postgres.query in its tool
list and does the right thing without being told: before drafting any
migration steps, it inspects the existing schema with queries like:
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'users'
ORDER BY ordinal_position;
It does this for every table the new migration might touch, not just the obvious ones, because it was trained to build context before acting. The returned rows get folded into the planning context, and the migration steps the planner writes are grounded in what actually exists.
Step 4 — Observe PLAN.md referencing real
columns.
The resulting plan will include a block that looks like this:
Task 5: Migration -- create user_preferences table
Preconditions verified via postgres.query:
- users table exists with columns (id, email, created_at,
updated_at, last_login_at)
- no existing user_preferences table
Migration SQL:
ALTER TABLE users ADD COLUMN preferences JSONB;
-- or, if we prefer a separate table:
CREATE TABLE user_preferences (
user_id INTEGER PRIMARY KEY REFERENCES users(id),
...
);
That comment block listing preconditions is the key output of the whole exercise. A human reviewer can look at the plan and verify, at a glance, that the planner saw the real schema when drafting the migration. The reviewer’s job is no longer “spot-check every column name against the real database” — it is the much smaller job of “confirm the preconditions the planner listed are the right ones.”
Step 5 — Cleanup.
There is no cleanup. The Postgres MCP server is read-only and stateless: it holds no data on disk, writes no files, and makes no changes to your database. Killing Claude Code ends the subprocess; starting a new session respawns it from scratch with the same config. The only persistent trace is whatever lives in your query logs, which is a database-side concern and not a GSD one.
The Postgres MCP server is read-only. If you want the generated migrations to actually run, that’s a separate concern — GSD plans them, you (or your CI) execute them through your normal migration tool (Flyway, Liquibase, Alembic, sqlx-migrate, whatever you already use). Do not look for an MCP server that will run migrations for you. The separation of planning from execution here is not a limitation; it is the safety boundary that keeps a planning error from becoming a production outage.
Building MCP-Aware GSD Workflows
The two scenarios in Chapters 7 and 8 each used one MCP server in isolation. Real projects accumulate servers the way shell profiles accumulate aliases — a filesystem server, a GitHub server, a Postgres server for the main database, maybe a separate analytics Postgres, Stitch for UI, a custom server that wraps your internal deployment API. This chapter is about making the whole pile work together, and about the edges where MCP integration gets interesting.
Combining Agent Skills with MCP
Part I of this guide introduced agent skills — the small project- scoped prompt fragments that get injected into a subagent’s context whenever it runs in this project. Skills and MCP tools compose in a way that is more powerful than either one alone.
The trick is that a skill can refer to an MCP tool by name. A skill
is just prose injected into the planner’s or executor’s prompt, and if
that prose says “when planning database migrations, always call
postgres.query first to inspect existing tables,” the
planner will do exactly that. The skill doesn’t need to know how
postgres.query works; it just needs to mention the name.
The MCP server already told Claude Code about the tool’s schema during
discovery, so the planner can fill in the arguments correctly on its
own.
This pattern is how you bake team conventions into a project. Your
team decides that every new feature needs a database schema check before
planning? Write a one-paragraph skill that says so, mention the
postgres.query tool by name, and every subsequent
/gsd:plan-phase run will follow the rule without anyone
having to remember to invoke it. The same trick works for any MCP
server: a skill that says “for any UI task, always call
stitch.list_projects first to confirm the Stitch project
exists” costs you a paragraph and saves you a class of “undefined
project” errors you would otherwise hit at execute time.
A worked example makes this concrete. Suppose your team has agreed
that every database-touching phase must start by reading the current
schema, and every UI-touching phase must start by confirming the Stitch
project exists and has the expected screen count. You can bake both
rules into a skill like this (placed in
.claude/skills/team-conventions.md or wherever your
project’s agent skills live):
# Team conventions for planning and execution
## Before any database migration task
1. Call postgres.query with a schema inspection query
against information_schema.tables and information_schema.columns
for the affected tables.
2. Include the returned column list in the task preconditions.
3. Only then draft the migration SQL.
## Before any Stitch-generated UI task
1. Call stitch.list_projects and confirm the configured project
ID is present.
2. Call stitch.list_screens for that project and record the
current screen count.
3. If the screen count differs from what DISCUSS.md claims, stop
and surface the mismatch instead of generating blindly.
The skill is maybe thirty lines of prose. The behavior change is substantial: you stop finding out about schema drift at execute time and start finding out about it at plan time, when fixing it is cheap. Notice that this pattern does not require any changes to GSD’s source code, to the MCP servers themselves, or to Claude Code’s configuration. It is entirely text, and the text flows through the same discovery-and-injection paths you already use for every other skill.
The reason this works is worth pausing on. Subagent prompts are assembled fresh at the start of each subagent invocation: the base template, the current phase’s context, any applicable skills, and the current tool manifest. When a skill mentions a tool name, the planner sees that mention and the tool’s schema side by side. It doesn’t have to guess what arguments the tool takes, because the schema is right there. It doesn’t have to be reminded to use the tool, because the skill says so in plain language. Skills and MCP compose cleanly because they’re both contributing to the same assembled prompt from different directions.
The GSD Plugin Format (Issue #1883)
There’s a larger integration idea in the pipeline. GSD issue #1883 tracks a plan to package GSD itself as a Claude Code plugin — a single installable unit that would bundle GSD’s skills, subagent templates, and a dedicated MCP server, all in one place. The promise is that a user could run one install command and get the entire GSD experience, with its own MCP surface exposed as a first- class tool set, rather than having GSD live alongside MCP as a separate concern.
This is on the roadmap for v1.32 or later — the plugin format is still evolving, and the schema the plugin manifest will use is not yet fixed. Committing to a specific shape in this guide would be premature. The thing to know is that it’s coming, that it is specifically designed to make MCP integration a first-class part of GSD rather than an afterthought, and that the scenarios in this part will still work on the plugin version because the underlying primitives are unchanged. Follow the issue if you want early visibility.
Observability: Knowing What Your MCP Servers Did
An underappreciated part of running MCP at scale is knowing what your servers have actually been doing. Claude Code’s conversation log shows you every tool call the assistant made, but once you have five or six MCP servers running, that log can get noisy, and correlating “this file on disk” with “this tool call that produced it” is not always easy from the transcript alone.
Three patterns help. First, most MCP servers can be started in a
verbose or debug mode that prints each tool invocation to stderr. Claude
Code surfaces server stderr in its own logs, so turning on verbose mode
on a suspect server is often the fastest way to catch a misbehaving
call. The flag is server-specific (look for –verbose,
–log-level debug, or an environment variable like
LOG_LEVEL=debug), but it exists almost everywhere because
MCP server authors are themselves debugging their own servers against
Claude Code and need the same signal.
Second, for servers you write or control, emit a one-line audit record per tool call: timestamp, tool name, argument summary, outcome, duration. Pipe it to a file or to your logging pipeline. When something goes sideways at execute time, the audit log tells you what actually happened versus what the plan says was supposed to happen — and those two narratives are not always the same.
Third, for production-grade setups, consider wrapping your MCP servers behind a single reverse-proxy MCP that logs all traffic and then forwards calls to the appropriate underlying server. This adds a hop but gives you one place to look for everything. It’s overkill for personal projects and exactly right for shared team infrastructure where “who called what, when, against which database” is a compliance question with an actual answer.
Version Drift: What Happens When a Server Changes
Your MCP servers will change under you. The Stitch proxy will publish a new version that renames a tool argument. The Postgres server will add a new capability you’ve been wanting. A third-party server you depend on will be deprecated in favor of a successor that is almost but not quite backwards-compatible. None of this is exotic; it’s normal package ecosystem behavior, and MCP is no different from npm or pip in that regard.
GSD copes with this better than you might expect, because the discovery step happens at the start of every session. The planner sees the current set of tools with the current schemas — not a cached snapshot from when you first configured the server. If a tool’s argument list changes overnight, the next plan-phase run will use the new shape without you having to update anything in GSD. If a tool disappears entirely, the planner will stop proposing it, because it won’t see it.
The failure mode to watch for is a skill that names a tool
explicitly. If your team-conventions skill says “always call
postgres.query” and someone upgrades the Postgres MCP to a
version where that tool was renamed to postgres.execute,
the skill will keep asking for the old name, the planner will try to
call it, and the call will fail. The fix is to update the skill to
match. A short CI check that grep-s your skills for known MCP tool names
and cross-references them against a list of currently- installed servers
is cheap and will save you a debug session.
A short checklist of what you should take away from Part II:
Project-scoped servers for project-specific tools: the database for this app, the Stitch project for this UI, the staging endpoints for this service.
User-scoped servers for personal utilities: filesystem, GitHub, your daily-driver search MCP, anything you’d want available in every Claude Code session.
Env-var your credentials every time, without exception. Never paste a real API key into a committed file, and never paste one into a file you’re “about to” gitignore.
.mcp.jsonbelongs in.gitignoreany time it carries secrets, even if you’re using env-var interpolation. The shape of the file itself can leak information you don’t want to publish.For teams, prefer OAuth with service accounts over shared API keys. Per-user auth with proper rotation is the only approach that survives a team member leaving. Shared keys are a time bomb with a known fuse length.
Token expiration is real: OAuth tokens die after an hour. Use proxies that auto-refresh. Do not use raw remote MCP auth for anything you plan to run longer than a demo.
The overarching theme of Part II is that MCP is plumbing, and good plumbing disappears. Once your servers are configured, the gitignore is in place, and the env vars are exported, you stop thinking about MCP at all. GSD’s planner writes plans that use the tools you’ve plugged in, the executor calls them, and the outputs land on disk. The configuration work is upfront and small. The payoff is that every subsequent plan and every subsequent execution has access to the real state of your project — the real schema, the real Stitch project, the real repository — and the quality of the resulting work rises accordingly. That is the entire argument for connecting GSD to MCP, and it is the reason Part III will assume all of this is in place when it turns to multi-repo and monorepo workflows.
Multi-Repo Workflows
Workstreams vs. Workspaces — When to Use Which
Parts I and II taught GSD to speak your project’s dialect and to reach outside of itself. Part III is about a more mundane but equally painful problem: real projects are rarely a single repository with a single timeline. Even when a project is a single repo, it often runs several independent tracks of work in parallel — a backend feature, a frontend redesign, an infrastructure migration, a one-off experiment — each with its own planning cadence, its own owners, and its own definition of “done”. And when the project genuinely spans multiple repositories, as microservice fleets and polyrepo libraries often do, the coordination problem multiplies.
GSD has two different answers to that problem, and learning when to reach for each is the difference between a smooth multi-track workflow and a pile of overwritten STATE.md files.
Workstreams: One Repo, Many Timelines
Workstreams were introduced in GSD v1.28. The idea is dead simple:
if you already have one repo, keep one repo, but give each
independent timeline its own isolated planning subtree. Under the
hood, a workstream is a directory under
.planning/workstreams/ that contains its own STATE.md,
ROADMAP.md, REQUIREMENTS.md, and phase directories — a complete parallel
planning tree, scoped under a name of your choice. When you switch to a
workstream, GSD transparently re-roots all its planning operations to
read and write from that subtree. The root .planning/ is
untouched.
The commands are short and predictable.
/gsd:workstreams create <name> creates a new one.
/gsd:workstreams switch <name> makes it the active
workstream for subsequent commands. /gsd:workstreams list
shows all of them. /gsd:workstreams complete <name>
archives one when its feature ships. The switch itself is essentially
free — GSD just updates a pointer to the active workstream directory —
so you can flip between two or three of them many times a day without
thinking about the cost.
What workstreams don’t do is give you git isolation. The
code is still the same checkout. If two workstreams both want to edit
src/billing.ts on the same branch, they will conflict in
the same way they would without workstreams. Workstreams isolate
planning state, not source state. For many real
projects that distinction is exactly right, because the planning
collision is the painful one and the source collision is already handled
by git.
Workspaces: Many Repos, One Plan
Workspaces are the heavier instrument, also introduced in v1.28. A
workspace is a separate directory — by convention under
~/gsd-workspaces/<name>/ — that contains one or more
named repositories, each checked out as either a git worktree or a full
clone, plus a workspace-level manifest file called WORKSPACE.md. Each
repo inside the workspace has its own independent
.planning/ subtree. The workspace itself is the
coordination layer: it knows which repos are in the workspace, tracks
cross-repo status in its manifest, and lets you run coordinated planning
without having to merge the repos into a monorepo you didn’t want.
The commands mirror the workstream ones but operate at the repo-set
level.
/gsd:new-workspace –name v2-launch –repos api,web-frontend,admin-dashboard
creates a workspace named v2-launch containing three repos.
/gsd:list-workspaces shows all configured workspaces.
/gsd:remove-workspace v2-launch tears one down when it’s
done. The –strategy flag takes one of two values:
worktree(the default) creates a git worktree of each named repo. Worktrees share object storage with the parent checkout, so they’re cheap to create and cheap to keep around. The trade-off is that you can’t have two worktrees on the same branch of the same repo at the same time — git refuses.cloneperforms a fullgit cloneof each named repo into the workspace directory. This is slower and uses more disk, but it gives you complete isolation — the workspace’s copy of each repo is genuinely independent from your main checkout and can sit on any branch without conflict.
Workspaces cost more to create than workstreams because they involve actual git operations — worktree creation or cloning — and a new directory tree on disk. A workspace with three repos takes a few seconds to materialize, not the sub-millisecond switch of a workstream. That’s the right cost when you actually need repo-level isolation, and the wrong cost when you don’t.
Decision Table
The shortest path to the right answer is to ask: do my parallel timelines share code, or do they share planning? If they share code, a workstream is what you want. If they need different code or different git state, a workspace is what you want.
| Scenario | Use | Why |
|---|---|---|
| Backend + frontend in same repo | Workstream | Same code, different planning timelines. The extra cost of a workspace buys you nothing. |
| Three microservices needing coordinated planning | Workspace (multi-repo) | Different repos, each already has its own
.planning/. The workspace is the coordination layer. |
| Feature branch isolation, current repo only | Workspace –repos . |
A single-repo workspace gives you a worktree of the current repo with its own independent planning state. |
| Disposable spike or prototype | Workspace
–strategy clone |
Full isolation. When the spike dies, remove the workspace and nothing leaks back into the main checkout. |
| Refactor touching several repos atomically | Workspace
–strategy worktree |
Coordinated planning across the set, shared git fast-path under the hood. |
Workstreams are much cheaper than workspaces. If the work you’re about to start fits in a single repo, default to creating a workstream and only escalate to a workspace if you genuinely need git-level isolation. It is always easier to graduate a workstream into a workspace later than it is to undo a premature workspace.
One More Consideration: How Long Does the Work Live?
A secondary question worth asking before you pick is how long the timeline is expected to live. Workstreams are well suited to long-lived, repeating tracks: “the backend team’s current feature”, “the ongoing observability migration”, “this quarter’s performance work”. They rotate through many features over their lifetime, and the fact that their STATE.md persists across those features is useful.
Workspaces are better suited to cohorts of work that share a release — the kind of thing where you’d otherwise say “we’re shipping v2 across three services and I need to track all three together”. When the cohort ships, you remove the workspace. The repos themselves live on, but the workspace’s WORKSPACE.md and its coordination state get archived or deleted along with the launch it tracked.
Neither pattern is wrong for the other’s use case, but using a workstream for a launch cohort tends to feel fiddly (you end up making a workstream in each repo and hand-coordinating) and using a workspace for a long-lived single-repo track tends to feel heavy (you end up with a workspace directory that never quite gets cleaned up). Match the lifetime of the construct to the lifetime of the work.
Scenario — Monorepo with Independent Apps
You run a single git repository containing three apps under
apps/: a Node backend API, a Next.js frontend, and a React
Native mobile app. The three teams work on different cadences — the
backend team is mid-migration to a new ORM, the frontend team is
redesigning the onboarding flow, the mobile team is fighting a long-tail
of platform-specific bugs. They all share the same git history and a
fair amount of code under packages/, so splitting into
three repositories would be more disruptive than it is worth. But
running all three teams’ GSD plans against the same root
.planning/ has already produced one STATE.md corruption
this month, and you are done with that.
This scenario walks through giving each team its own workstream, running an independent GSD flow in each, and archiving them cleanly when features ship. Total elapsed time: about five minutes of commands, most of which are the normal GSD flow you already know.
Step 1. Starting layout.
The repository looks like this before you touch anything:
my-monorepo/
|-- apps/
| |-- backend/
| |-- frontend/
| `-- mobile/
`-- .planning/
|-- workstreams/
| |-- backend-api/
| | |-- ROADMAP.md
| | `-- STATE.md
| `-- frontend-redesign/
| |-- ROADMAP.md
| `-- STATE.md
`-- config.json <- shared config across workstreams
(The .planning/workstreams/ subtree is what we’re about
to create. The root .planning/config.json already exists
and stays where it is — it holds the shared config all workstreams
inherit from.)
Step 2. Create the backend workstream.
/gsd:workstreams create backend-apiGSD creates .planning/workstreams/backend-api/ and
scaffolds a fresh set of planning files inside it — an empty ROADMAP.md,
an initial STATE.md pointing at an un-planned phase, and a small
WORKSTREAM.md marker file that the tooling uses to detect that the
subtree is a workstream rather than a plain directory. The root
.planning/ is not touched. You should see output along the
lines of:
Created workstream: backend-api
Path: .planning/workstreams/backend-api/
Status: active
Step 3. Switch to the new workstream.
/gsd:workstreams switch backend-apiSwitching makes backend-api the active workstream. Every
subsequent /gsd:* command in this session will read and
write state from .planning/workstreams/backend-api/ instead
of the root .planning/. There’s no second-command mystery
here: if you forget to switch, /gsd:progress will tell you
which workstream it’s reporting on in its first line, so you catch the
mistake early.
Step 4. Run a normal GSD flow, scoped to backend.
At this point, everything looks and feels like vanilla GSD. You start a new project for the ORM migration:
/gsd:new-project
/gsd:discuss-phase 1
/gsd:plan-phase 1
/gsd:execute-phase 1The PLAN.md, STATE.md, and phase directories that come out of this
flow all live under .planning/workstreams/backend-api/. The
root .planning/ does not see any of it. If you run
/gsd:progress right now, you’ll get the backend
workstream’s view — phase 1, in progress, whatever your executor is
chewing on.
Step 5. Create the frontend workstream in parallel.
While the backend team keeps going, the frontend lead wants to start the redesign plan. In a new terminal (or just later in the same terminal), they run:
/gsd:workstreams create frontend-redesign
/gsd:workstreams switch frontend-redesign
/gsd:new-project
/gsd:discuss-phase 1This creates a second, entirely independent planning subtree at
.planning/workstreams/frontend-redesign/. The two
workstreams don’t see each other’s state files. They can be in
completely different phases, have completely different agent
configurations, use different project_code prefixes — the
isolation is at the planning-state level, not at the git level.
Step 6. List workstreams.
When you want to see what’s going on across the whole repo, ask:
/gsd:workstreams listThe output is a small table, one row per workstream, giving you name, status, current phase, and when it was last touched:
NAME STATUS PHASE LAST TOUCHED
backend-api active 1 3 minutes ago
frontend-redesign active 1 1 minute ago
This is the one command that crosses workstream boundaries. Everything else respects the active-workstream scope and only shows you one workstream’s view at a time.
Step 7. Complete and archive.
Two weeks later, the backend ORM migration ships. You finish the last phase, run your tests, push the merge, and then retire the workstream:
/gsd:workstreams complete backend-apiCompletion does not delete anything. GSD moves the workstream’s
directory to .planning/workstreams/archive/backend-api/,
sets its status to archived, and stops surfacing it in the
default list output. If you ever need to resurrect it — say
to investigate a post-ship question — you can switch back to the
archived workstream and treat it as read-only evidence.
Step 8. Cleanup.
Repeat the complete step for the frontend workstream
when its redesign ships. The root .planning/ has stayed
clean throughout: all the actual work lived under
.planning/workstreams/, and all the archived evidence lives
under .planning/workstreams/archive/. Your repo is ready
for the next set of parallel tracks, and there’s a searchable record of
what the old tracks did.
Each workstream’s /gsd:progress only reports on that
workstream’s state. If you want a cross-workstream view, run
/gsd:workstreams list. This is the right scoping default —
a backend dev should see backend state by default, not be distracted by
frontend state — but it does mean that “all-workstreams” views are an
explicit command rather than the default.
Scenario — Microservice Fleet
You have three separate repositories on disk: api,
web-frontend, and admin-dashboard. They’re
independent services with independent deployment pipelines, but they all
ship together as part of the v2 launch, and the product manager would
like a single view of “how is v2 going?” that doesn’t require opening
three different planning files in three different checkouts. A workspace
is exactly the right shape for this: it gives you a coordination layer
on top of the three repos without forcing you to merge them into a
monorepo.
This scenario walks through creating the workspace, running coordinated planning across all three repos, and tearing the workspace down cleanly when v2 ships.
Step 1. Create the workspace.
/gsd:new-workspace --name v2-launch \
--repos api,web-frontend,admin-dashboardGSD looks up each named repo — either via relative paths, or via
registered repo aliases (you can list the registered aliases with
/gsd:list-workspaces’s config output) — and materializes
the workspace directory under ~/gsd-workspaces/v2-launch/.
For each repo, GSD creates a git worktree (the default strategy)
pointing at the repo’s current branch and drops it inside the workspace.
It also writes a WORKSPACE.md manifest at the workspace root listing the
repos and their current phase status.
The whole operation takes a few seconds. At the end of it, you have a fully usable workspace containing three independently planned repositories, sitting side by side.
Step 2. Inspect the workspace tree.
~/gsd-workspaces/v2-launch/
|-- WORKSPACE.md <- workspace manifest (orchestrator-owned)
|-- api/ <- repo 1 (independent .planning/)
| |-- .planning/
| | |-- ROADMAP.md
| | `-- STATE.md
| `-- src/
|-- web-frontend/ <- repo 2 (independent .planning/)
| |-- .planning/
| `-- src/
`-- admin-dashboard/ <- repo 3 (independent .planning/)
|-- .planning/
`-- src/
WORKSPACE.md is the orchestrator-owned manifest — it tracks which
repos belong to the workspace and what the cross-repo phase status is.
Each repo below it is an independent checkout with its own
.planning/ subtree. The three .planning/ trees
are genuinely independent: they don’t share state, they don’t share
phase numbers, they don’t share REQUIREMENTS.md. The WORKSPACE.md
manifest is the only thing that knows about all three at once.
Step 3. cd into each repo and run its own GSD
flow.
Inside the workspace, each repo behaves like a normal GSD project.
You change into one repo, run the normal commands, and the normal files
land in that repo’s .planning/:
cd ~/gsd-workspaces/v2-launch/api
/gsd:new-project
/gsd:discuss-phase 1
/gsd:plan-phase 1Then the same in web-frontend and
admin-dashboard, each with its own plan. There is
deliberately no shared planning file across the three, because the teams
work in different languages, different release cadences, and different
levels of detail. What they share is the workspace manifest, which is
updated by the orchestrator — never by the individual executors — as
each repo’s phase status changes.
Step 4. Execute phases in each repo.
Phase work happens locally in each repo. When you run
/gsd:execute-phase inside api, the executor
operates on api’s files and writes to api’s
.planning/STATE.md. When the phase finishes, the
orchestrator reads the final state and updates WORKSPACE.md at the
workspace root so that the cross-repo view stays accurate. If you do
nothing, the manifest will drift behind; if you run the normal GSD flow,
it stays current without any extra commands from you.
Step 5. Strategy comparison: worktree vs. clone.
The –strategy flag at workspace-creation time is worth
understanding, because the two strategies have meaningfully different
cost profiles:
| Strategy | Speed | Disk | Isolation | Best for |
|---|---|---|---|---|
| worktree | Fast (seconds) | Minimal (shared objects) | Shared git history | Coordinated multi-repo planning; most launch cohorts |
| clone | Slower (full clone per repo) | Full working copy each | Complete | Spikes you may throw away; same-branch parallel work |
The worktree strategy is the default because it’s almost always what
you want. It shares git object storage with the parent checkout on disk,
so materializing three worktrees of large repos is essentially free. The
one constraint to be aware of is that git refuses to have two worktrees
of the same repo on the same branch at the same time; if the workspace
and your main checkout both want to live on main, the
workspace’s worktree has to be on a different branch.
Use the clone strategy when you specifically need full
isolation — for example, when you’re running a disposable experiment and
don’t want git objects from the spike leaking back into your main
checkout, or when you need the workspace and main checkout to sit on
exactly the same branch at the same time.
Step 6. List and inspect.
When you want to see all your workspaces, or pull up the manifest for one:
/gsd:list-workspaces
cat ~/gsd-workspaces/v2-launch/WORKSPACE.md/gsd:list-workspaces prints a table of every workspace
on the machine, showing name, repos, strategy, and last activity.
WORKSPACE.md is a plain markdown file, human-readable, that
captures the orchestrator’s cross-repo view — phase status per repo,
outstanding blockers, last update timestamp. You can skim it in any
editor without running GSD.
Step 7. Cleanup.
When v2 finally ships, you retire the workspace:
/gsd:remove-workspace v2-launchThis command asks you to confirm the workspace name before proceeding — a small guardrail against fat-fingering the wrong name and destroying an active workspace. Once confirmed, GSD removes the workspace directory, tears down the worktrees (git reports each one as pruned), and cleans up the WORKSPACE.md manifest. The underlying repos themselves are untouched: the api, web-frontend, and admin-dashboard checkouts elsewhere on your disk are completely unaffected.
/gsd:remove-workspace will refuse to
remove the workspace if any constituent repo has uncommitted changes.
The protection is intentional: uncommitted work in a worktree is easily
lost if the worktree is removed before being committed back, and “I just
meant to clean up” is exactly how people lose an afternoon’s work.
Either commit, stash, or push your work in each repo first, then re-run
the removal. If you are absolutely sure you want to discard the
uncommitted state, do it explicitly — stash or reset each repo by hand —
so that the decision is deliberate.
Part II introduced project-scoped .mcp.json as the right
way to configure an MCP server. That pattern composes cleanly with
workspaces: a Stitch MCP server configured inside
~/gsd-workspaces/v2-launch/web-frontend/.mcp.json lives
inside that workspace’s copy of the web-frontend checkout and does not
bleed into a separate workspace’s web-frontend checkout, or
into your main web-frontend checkout elsewhere on disk.
Multi-repo isolation and MCP scoping compose exactly the way you’d
expect them to.
Worktree Isolation Deep Dive
This chapter is the most technically delicate in Part III. The details matter, and one of them in particular — the STATE.md ownership rule — is the kind of thing that will silently corrupt your planning files if you get it wrong. Read carefully.
What workflow.use_worktrees: true Does
In .planning/config.json, a workflow setting controls
how parallel execution is sequenced:
{
"workflow": {
"use_worktrees": true,
"cleanup_worktrees": true
}
}When use_worktrees is true,
/gsd:execute-phase creates a temporary git
worktree for each parallel executor agent it spawns. Each agent gets its
own private filesystem view of the same git history, rooted at a scratch
directory under .planning/worktrees/ (or wherever the
config points). The agents do their work inside those scratch
directories, and once the phase completes, the orchestrator walks the
set and aggregates their results. If cleanup_worktrees is
also true, the scratch directories are removed after
aggregation.
Without worktrees, parallel executors would share the main checkout. That’s not a theoretical problem — it’s a concrete, reproducible one. Two agents editing the same file race. Two agents staging different versions of a change overwrite each other. A half-applied refactor gets committed on top of an unrelated agent’s fix and the git history becomes unreadable. Worktrees solve this by giving each agent a filesystem it doesn’t have to share with anyone else, while still keeping all of them rooted in the same git repository so the orchestrator can collect their output.
The STATE.md Ownership Rule
This is the critical constraint. Get it right, and parallel execution is safe. Get it wrong, and you reintroduce the lost-update bug that Issue #1571 was filed to fix.
The orchestrator owns STATE.md and ROADMAP.md. Parallel executor agents must not write to these shared planning files directly. Each agent writes a plan-local SUMMARY.md inside its own worktree. After each wave, the orchestrator reads the SUMMARY.md files from every worktree and aggregates them into STATE.md and ROADMAP.md in the main checkout. This single-writer model is the entire reason parallel GSD execution is safe. Removing it — even “just this once, for a small edit” — reintroduces the lost-update bugs that corrupted shared state before the fix in Issue #1571 landed.
The rule is small enough to state in one sentence: only the orchestrator writes to STATE.md or ROADMAP.md. Everything else in the design follows from that constraint.
What Agents Actually Write: SUMMARY.md
Each parallel executor agent writes exactly one file in its worktree that is visible to the rest of the system: a plan-local SUMMARY.md. The SUMMARY.md format is intentionally small and additive. It captures:
what the agent was asked to do (copied from its slice of the plan),
what it actually did (files touched, tests run, tests passing),
what it didn’t do and why (blocking dependency, unclear requirement, environmental failure),
any deviations from the plan that need to be reconciled at aggregation time.
The SUMMARY.md is written in the worktree, not in the main checkout. It lives next to the agent’s work, and it describes what that agent did within its own sandbox. The orchestrator is responsible for reading it and deciding what to do about it. The agent never tries to speak on behalf of the plan as a whole, and the agent never writes into STATE.md or ROADMAP.md in the main checkout or in its own worktree. If you find yourself writing an executor that edits STATE.md, stop. You are reintroducing #1571.
Post-Wave Aggregation
After all parallel executors in a wave finish, the orchestrator runs an aggregation step. It:
Enumerates the worktrees created for this wave.
Reads each worktree’s SUMMARY.md.
Reconciles the set against the master plan — which slices completed, which slices deviated, which slices blocked.
Updates STATE.md and ROADMAP.md in the main checkout with the aggregated result. This is the single write to the shared planning state for the entire wave.
If
cleanup_worktreesistrue, removes the worktrees and prunes the git worktree registry.
The important property is that STATE.md goes from pre-wave to post-wave in exactly one transition, authored by one process (the orchestrator), reflecting the aggregate of what all the executors did. There is never a window where two writers are touching STATE.md at the same time, because there is only ever one writer.
The git clean Prohibition
As of GSD v1.32 (Issue #2075), the executor wrapper
blocks git clean in worktree context. Do
not try to work around this. A stray git clean -fd can
delete the prior wave’s output before the orchestrator has had a chance
to aggregate it — including SUMMARY.md files that were the only record
of what just happened. If you need to remove untracked files inside a
worktree, do it explicitly per-file (rm path/to/file), not
en masse. The prohibition is there because the failure mode is silent:
you lose work and don’t notice until the aggregation step asks for a
SUMMARY.md that no longer exists on disk.
The prohibition applies to any variant of the command —
git clean -f, git clean -fd,
git clean -fdx — and the wrapper will reject the call with
a diagnostic that explains why. If you have a legitimate need to remove
large quantities of untracked files from a worktree, the right move is
to finish the wave, let the orchestrator aggregate, verify that STATE.md
is up to date, and then clean up outside of the worktree
mechanism.
Toggling Worktrees Off
Sometimes you don’t want worktree-based parallel execution. Setting
workflow.use_worktrees to false runs every
executor in the main checkout, serially. Use this when:
Your environment doesn’t support git worktrees cleanly. Some CI runners and some older Windows setups have limitations that make worktrees flaky. Serial execution is more boring but more predictable in those environments.
You’re debugging and want a single deterministic file tree. Worktrees make it harder to
cdinto “the” checkout and poke around, because there are several checkouts. Turning them off is a reasonable diagnostic step.You intentionally want sequential execution — for example, on a very small machine where the memory cost of running three agents in parallel is worse than the wall-clock cost of running them in series.
The trade-off is loss of parallelism. Phases will take longer to execute. The STATE.md ownership rule still applies in serial mode — the orchestrator is still the only writer — but the risk of races in the main checkout goes away because there is never more than one executor running at a time. If you find yourself fighting worktree edge cases and you don’t specifically need parallelism, the boring serial path is a legitimate choice and not a defeat.
The orchestrator-only-writes-STATE.md rule is what makes parallel GSD execution safe. If you ever find yourself tempted to have an executor agent write directly to STATE.md or ROADMAP.md — “just for this one case, to update a small field” — stop and reread Issue #1571. You are about to reintroduce the bug that the current design exists to prevent.
Team Patterns for Multi-Repo GSD
The previous chapters were about mechanics. This chapter is about the boring operational discipline that turns the mechanics into something a team of humans can actually live with. None of it is exotic. All of it is the kind of thing that quietly separates a team with a consistent GSD workflow from a team where every repo behaves slightly differently and no one can remember why.
Standardizing Config Across Repos
The single biggest lever for consistency is a shared
.planning/config.json template that every repo in your team
copies in. Without a template, every repo ends up with a slightly
different configuration — one has worktrees enabled, another doesn’t, a
third uses a different project_code prefix, a fourth has
three agent skill paths the others don’t know about. The teams don’t
notice at first, and then six months later somebody asks “why does the
billing repo’s GSD output look different from the accounts repo?” and
the answer is a sprawl of micro-decisions no one wrote down.
The fix is to commit a canonical template to a shared
gsd-templates repo and pull it into every new project via a
setup script. The template should pin the fields that matter for team
consistency:
workflow.use_worktrees— typicallytruefor teams that execute phases in parallel.workflow.cleanup_worktrees— typicallytrueto avoid accumulating stale worktree directories.project_code— the short prefix used for phase directories. Leave this as a template variable so each repo substitutes its own code at setup time.Any shared agent-skill paths — if your team has a library of skills living in a shared directory, the paths belong here so every repo picks them up.
Any shared verifier or reviewer configuration that your team applies uniformly.
What should not go in the team template is anything specific to a single repo: repo-specific build commands, local credentials, one-off overrides. Those belong in the repo’s own config, layered over the template.
MCP Server Scoping, Restated
Chapter 12 touched on this in passing and it’s worth restating
because it’s the single most common mistake teams make with MCP in a
multi-repo context: do not try to maintain a single global
~/.claude.json for a microservice fleet. Use project-scoped
.mcp.json files, one per repo, and commit the non-sensitive
parts.
The reason is that microservices are each their own context. The
billing service wants a connection to the billing database. The auth
service wants a connection to the auth database. If you wire both of
those into a single global MCP config, every Claude Code session in
every repo gets every database, and the failure modes are confusing —
wrong DB picked up, permissions errors from a service that should never
have been able to reach that DB in the first place, credentials leaking
across contexts. Project-scoped .mcp.json files keep each
service’s tool surface minimal and make it obvious from the repo alone
what external systems that service talks to.
Workspaces, as we saw in Chapter 12, compose cleanly with this
pattern. Each repo inside a workspace carries its own
.mcp.json, and a workspace doesn’t reach across its
constituent repos to share MCP servers.
Workstream Naming Conventions
Workstream names end up in /gsd:workstreams list output,
in .planning/workstreams/ directory listings, in commit
messages, and occasionally in PR descriptions. A little discipline here
pays off indefinitely:
Verb-noun format.
add-billing,migrate-auth,refactor-orm. Every workstream is doing something, and the name should make that obvious at a glance.No timestamps in the name. A workstream called
2026-04-backend-worklooks tidy at creation time and looks dreadful six months later when you have fifteen of them and none of the names tell you what the work was. Use creation timestamps via thelast touchedcolumn in/gsd:workstreams list, not in the name.Don’t use a bare ticket ID as the whole name. A workstream called
JIRA-1234is opaque. Pair the ticket ID with a short description:JIRA-1234-billing-flow. You get the ticket traceability and the at-a-glance readability at the same time, and the cost is a handful of extra characters.Short is better than long. Aim for workstream names that fit in a reasonable table column. 30 characters is plenty; 60 is painful.
None of these rules are enforced by the tooling. They are the kind of
thing you write down in your team’s gsd-templates repo next
to the config template and agree to honor.
The project_code Field
A small but underrated piece of team config is the
project_code field in .planning/config.json.
It’s a short prefix — three or four letters — that GSD uses when it
names phase directories and in some default commit message templates.
Setting it once:
{
"project_code": "BCS"
}means that phase directories come out looking like
BCS-phase-1-auth-refactor instead of just
phase-1-auth-refactor. When a human later skims a flat
directory listing — or reads a commit message that references a phase —
the prefix tells them immediately which project the phase belongs to. In
a single-repo workflow the prefix is mostly decorative. In a multi-repo
team workflow, where a reviewer might be looking at phases from several
different services in the same week, the prefix is a small but constant
cue that prevents cross-project confusion.
Pick short, memorable codes: BCS for
billing-core-service, PAY for payments, ADM
for admin-dashboard. Three or four characters, upper-case, no
punctuation. Record the mapping somewhere humans can find it — a short
table in your team README or in the gsd-templates repo
works fine.
Ticket-Based Phase Identifiers
GSD supports tagging phases with external ticket identifiers — the Jira issue, the Linear ticket, the GitHub issue number, whatever your team tracks work in. The mechanism is to record the ticket reference as phase metadata at planning time, and GSD will carry it through to commit message templates and PR descriptions automatically.
The details of which config key holds the ticket reference are version-specific and worth checking against your GSD release’s documentation before you commit to a convention. What matters at the team level is the pattern: choose one ticket system, pick one field name, put it in the team config template, and make sure every planner is using it. When it’s set up, you get traceable commits without any extra effort from the people doing the work — the ticket ID threads through from plan to commit to PR automatically.
The payoff shows up when you’re looking at a six-month-old commit and
wondering why it exists. If the commit says “implement session token
rotation” and the message also carries PAY-1734, you click
through to the ticket and get the full context. Without that thread,
you’re archaeologizing.
Standardize the boring parts — config template, naming conventions, MCP scoping, ticket integration — and your team’s GSD output will look consistent across repos even when different humans are driving different services. The interesting parts — what to plan, how to scope phases, when to escalate a workstream to a workspace — stay where they belong: in the hands of the people doing the work. Operational consistency and creative autonomy are not in tension. They live at different layers of the stack, and a well-set-up team separates them cleanly.
Runtime Compatibility Matrix
A consolidated lookup of every coding-agent runtime that GSD’s installer targets. Use this when you need to verify the install path, the verify command, or the workspace/MCP support level for a runtime without paging back through Part I.
| Runtime | Skills Format | Install Dir (global) | Verify | MCP | Workspace |
|---|---|---|---|---|---|
| Runtime | Skills Format | Install Dir (global) | Verify | MCP | Workspace |
| Claude Code 2.1.88+ | skills/gsd-*/SKILL.md |
~/.claude/skills/ |
/gsd:help |
yes | yes |
| Claude Code (legacy) | commands/gsd/*.md |
~/.claude/commands/ |
/gsd:help |
yes | yes |
| Codex | skills/gsd-*/SKILL.md |
~/.codex/skills/ |
$gsd-help |
partial | yes |
| Copilot | prompts + agents | ~/.github/ |
/gsd:help |
partial | no |
| Cursor | skills/gsd-*/SKILL.md |
~/.cursor/ |
/gsd:help |
yes | yes |
| Windsurf | markdown transform | ~/.windsurf/ |
/gsd:help |
partial | yes |
| OpenCode | config file | ~/.config/opencode/ |
/gsd-help |
yes | yes |
| Gemini CLI | skills | ~/.gemini/ |
/gsd:help |
yes | yes |
| Antigravity | skills | ~/.gemini/antigravity/ |
/gsd:help |
yes | yes |
| Cline | .clinerules |
~/.cline/ |
auto-loaded | no | partial |
| Augment | skills transform | ~/.augment/ |
/gsd:help |
partial | yes |
| Qwen Code | skills (open standard) | ~/.qwen/skills/ |
/gsd:help |
yes | yes |
Notes. The MCP column reflects whether
the host runtime exposes Model Context Protocol tool surfaces to its
agents. partial means MCP is supported via configuration
but the host’s discovery story is limited (e.g., Cline reads its rules
file but does not expose MCP tools the same way Claude Code does). The
Workspace column tracks GSD multi-repo workspace support;
partial means the runtime accepts GSD’s installed skill
files but does not run worktree-based phases.
Antigravity has both a global install directory and a per-project
./.agent/ directory. GSD’s installer auto-detects which to
use based on whether you run the install script with
–global or from inside a project directory.
MCP Server Quick-Reference
A scan-friendly table of MCP servers used in this guide, plus a
couple of honourable mentions you’ll reach for early. Each row lists the
package, the transport, the auth model, and the minimal config snippet
you’d drop into .mcp.json.
| Server | Package | Transport | Auth | Minimal Config Snippet |
|---|---|---|---|---|
| Google Stitch | @_davideast/stitch-mcp |
stdio (proxy) | OAuth / API key | {"command":"npx","args":["@_davideast/stitch-mcp","proxy"]} |
| Postgres | @modelcontextprotocol/
server-postgres |
stdio | connection string | {"command":"npx","args":["-y","@modelcontextprotocol/server-postgres"],"env":{"DATABASE_URL":"${DATABASE_URL}"}} |
| Filesystem | @modelcontextprotocol/
server-filesystem |
stdio | none (path scoped) | {"command":"npx","args":["-y","@modelcontextprotocol/server-filesystem","/abs/path"]} |
| GitHub | @modelcontextprotocol/
server-github |
stdio | PAT | {"command":"npx","args":["-y","@modelcontextprotocol/server-github"],"env":{"GITHUB_PERSONAL_ACCESS_TOKEN":"${GITHUB_TOKEN}"}} |
| Brave Search | @modelcontextprotocol/
server-brave-search |
stdio | API key | {"command":"npx","args":["-y","@modelcontextprotocol/server-brave-search"],"env":{"BRAVE_API_KEY":"${BRAVE_API_KEY}"}} |
Every snippet that contains a credential reference here uses an
environment variable placeholder. Drop the snippet into a real
.mcp.json only after exporting the corresponding variable
in your shell, and add .mcp.json to .gitignore
if it carries any credentials.
Choosing a transport. stdio is the right default for
tools that live next to your code. Use HTTP transport
(–transport http) for cloud services and shared team
servers; the GSD User Guide and the project’s
docs/USER-GUIDE.md cover the HTTP setup in more depth.
Workspace Command Cheat Sheet
The dense version. Print this page and tape it next to your monitor.
Workstreams (single-repo, isolated planning state)
/gsd:workstreams create <name> Create a new workstream
/gsd:workstreams switch <name> Switch the active workstream
/gsd:workstreams list Show all workstreams (active + archived)
/gsd:workstreams complete <name> Archive a workstream when done
State lives under .planning/workstreams/<name>/.
Switching is fast – GSD just swaps which subtree it reads.
Workspaces (multi-repo or full-isolation)
/gsd:new-workspace --name <name> --repos <repo1,repo2,...>
[--strategy worktree|clone]
/gsd:list-workspaces
/gsd:remove-workspace <name>
Strategies:
worktree(default) – git worktrees, lightweight, shares git history. Cannot have two worktrees on the same branch.clone– fullgit clone, complete isolation, higher disk cost. Use for spikes and throw-aways.
Workspaces live under ~/gsd-workspaces/<name>/
with a WORKSPACE.md manifest at the root.
/gsd:remove-workspace refuses removal if any constituent
repo has uncommitted changes. Commit, stash, or push first.
Relevant config keys
.planning/config.json:
workflow.use_worktrees: true | false
Per-phase parallel executor isolation via git worktrees.
workflow.cleanup_worktrees: true | false
Whether the orchestrator removes worktrees after phase completion.
project_code: "<prefix>"
Short prefix used in phase directory names and commit messages.
agent_skills:
executor: ["./skills/<dir>/", ...]
planner: ["./skills/<dir>/", ...]
researcher: ["./skills/<dir>/", ...]
Inject custom instruction directories into specific subagent types.
The single rule that makes parallel execution safe
The orchestrator owns STATE.md and ROADMAP.md. Parallel executor agents write only their plan-local SUMMARY.md. The orchestrator aggregates after each wave. Issue #1571 documents what happens when this rule is broken – don’t be the one who reintroduces it.