2

AI Agent Skills: Definition and Creation

A Comprehensive Technical Guide to the Agent Skills Specification

Research current as of: January 2026

1. Introduction to Agent Skills

Agent Skills represent a paradigm shift in how we extend AI agent capabilities.1IndustryEquipping agents for the real world with Agent SkillsAnthropic, October 2025View Source Launched by Anthropic in October 2025 and released as an open standard in December 2025, Agent Skills provide a lightweight, standardized format for packaging specialized knowledge and workflows that AI agents can discover and load dynamically.

What Are Agent Skills?

Agent Skills are organized folders of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks.2IndustrySkills vs Tools for AI Agents: Production GuideArcade AI, January 2026View Source They are modular capabilities packaged as Markdown files with YAML frontmatter, containing metadata and instructions that tell an agent how to perform a specific task.

71,000+
Community Skills Available
20,000+
GitHub Stars on Skills Repo
~50
Tokens for Metadata Only
Dec 2025
Open Standard Released

1.1 The Skills Timeline

October 16, 2025

Anthropic officially unveiled Agent Skills at their product launch event, introducing the concept of modular, discoverable agent capabilities.

December 18, 2025

Agent Skills released as an open standard at agentskills.io/specification, enabling cross-platform and cross-product reuse.

December 2025

OpenAI adopted the Agent Skills format for Codex CLI and ChatGPT, solidifying it as an industry standard.

January 2026

Major adoption by Microsoft (VS Code, GitHub), Cursor, Goose, Amp, OpenCode, and other leading AI development tools.

1.2 Why Skills Matter

Agent Skills solve several critical challenges in AI agent development:

Context Efficiency

Skills use progressive disclosure to manage context, loading only metadata (~50 tokens) at startup and full instructions (2,000-5,000 tokens) only when needed.1IndustryEquipping agents for the real world with Agent SkillsAnthropic, October 2025View Source

Knowledge Separation

Skills separate "know-how" (procedural knowledge) from "can-do" (tools/actions), enabling agents that are powerful, reliable, and compliant.4AcademicTool Learning with Foundation ModelsQin et al., ACL 2024View Paper

Reusability

Build once, use everywhere. Skills are version-controlled, shareable, and work across compatible agents like Cursor, Claude Code, and OpenCode.

Standardization

The open standard ensures interoperability across different agent platforms, preventing vendor lock-in and enabling ecosystem growth.

2. Skills vs Tools vs Plugins: A Technical Comparison

Understanding the distinction between Skills, Tools, and Plugins is crucial for effective agent design.5AcademicToolformer: Language Models Can Teach Themselves to Use ToolsSchick et al., NeurIPS 2024View Paper While these terms are sometimes used interchangeably, they represent fundamentally different approaches to extending AI capabilities.

2.1 Core Definitions

Aspect Skills Tools Plugins
Nature Procedural knowledge and workflows Executable functions with defined I/O Vendor-specific extensions
Represents What agents know What agents can do Platform-bound capabilities
Format Markdown with YAML frontmatter Code functions with schemas Varies by platform
Activation Suggested, agent decides when to load Explicitly called by agent Platform-specific invocation
Side Effects None (instructions only) Can modify external state Varies
Portability Cross-platform (open standard) Varies (depends on protocol) Platform-locked
Security Surface Minimal (prompt-based) Requires authentication & validation Platform-dependent
Token Cost Progressive (50-5000 tokens) Fixed (schema definition) Varies

2.2 The Behavioral Difference

Key Insight: Skills are Suggestions, Tools are Actions

A skill is a suggestion - the agent autonomously decides whether it needs that context and loads it when appropriate. A tool is an action - the agent explicitly calls it to perform an operation with real-world effects.

2.3 The Relationship Between Skills and Tools

Skills and tools are complementary, not competitive. The most effective agents use both:

Best Practice: Combine Skills and Tools

For simple bots, tools might be enough. But if you're building digital employees, you need skills to encode domain expertise and tools to execute actions.2IndustrySkills vs Tools for AI Agents: Production GuideArcade AI, January 2026View Source The separation ensures agents are not just powerful, but also reliable, compliant, and efficient.

2.4 When to Use Each

Use Skills For:

  • Domain-specific workflows
  • Multi-step procedures
  • Best practices and conventions
  • Template generation
  • Quality analysis patterns
  • Fast, repeatable workflows

Use Tools For:

  • Database queries
  • API calls
  • File system operations
  • Code execution
  • External service integration
  • State-modifying operations

Use Plugins For:

  • Quick platform-specific prototypes
  • Platform-bound capabilities
  • When portability isn't required
  • Legacy system integration

3. Agent Skills Architecture and Specification

3.1 Progressive Disclosure Architecture

The core innovation of Agent Skills is their three-phase progressive disclosure mechanism,1IndustryEquipping agents for the real world with Agent SkillsAnthropic, October 2025View Source which keeps agents fast while giving them access to extensive knowledge on demand.

Phase 1: Discovery
~50 tokens per skill
Phase 2: Activation
2,000-5,000 tokens
Phase 3: Execution
Variable tokens
Agent loads only what it needs, when it needs it

Phase 1: Discovery (Startup)

At startup, the agent scans available skills directories (e.g., .claude/skills/) and parses only the YAML frontmatter from each SKILL.md file. This creates a lightweight index of available capabilities.

# Discovery phase loads only metadata
name: api-design
description: REST API design best practices and conventions

# Total: ~50 tokens loaded into context

Phase 2: Activation (On Match)

When a user's request matches a skill's description, the agent reads the full SKILL.md file into its context. The description acts as a semantic trigger for skill activation.

# When user asks: "Design a REST API for user management"
# Agent activates api-design skill
# Loads full SKILL.md: ~2,000-5,000 tokens

Agent reasoning:
  - Task: "Design a REST API"
  - Matches: api-design skill description
  - Action: Load full skill instructions

Phase 3: Execution (As Needed)

The agent follows the loaded instructions and accesses specific resources (scripts, templates, assets) within the skill folder only when the instructions reference them.

Performance Impact

This architecture allows agents to maintain hundreds of skills while keeping initial context usage minimal. For example, 100 skills × 50 tokens = 5,000 tokens at startup, versus 100 skills × 3,000 tokens = 300,000 tokens if all instructions were always loaded.

3.2 Directory Structure Specification

A skill is a directory containing at minimum a SKILL.md file, with optional subdirectories for supporting resources:

my-skill/
├── SKILL.md              # Required: Instructions + metadata
├── scripts/              # Optional: Executable code
│   ├── setup.py
│   └── process.sh
├── references/           # Optional: Documentation
│   ├── api-spec.json
│   └── examples.md
└── assets/               # Optional: Templates, configs
    ├── template.yaml
    └── config.json

3.3 SKILL.md File Structure

Every SKILL.md file consists of two parts: YAML frontmatter (metadata) and Markdown body (instructions).

Frontmatter Fields

Field Required Description Constraints
name Yes Unique identifier for the skill Max 64 chars, lowercase, numbers, hyphens
description Yes What the skill does and when to use it Max 1024 chars, non-empty
license No License identifier (e.g., Apache-2.0) SPDX identifier recommended
metadata No Additional info (author, version, etc.) Freeform YAML
allowed-tools No Experimental: pre-approved tools Space-delimited list
compatibility No System requirements, network needs Freeform text

Critical: Description Quality

The description is the most important field - it's what the LLM uses to decide which skill to activate.6AcademicSymbol Tuning Improves In-Context Learning in Language ModelsWei et al., NeurIPS 2024View Paper Be specific and clear about what the skill does AND when to use it. Poor descriptions lead to skills never being activated or being activated incorrectly.

3.4 Integration Approaches

There are two main approaches to integrating skills into agent systems:

Filesystem-Based Integration

The agent operates within a computer environment (bash/unix) where skills are activated when models issue shell commands:

# Agent activates skill via filesystem
cat /path/to/my-skill/SKILL.md

# Access bundled resources
python /path/to/my-skill/scripts/process.py
cat /path/to/my-skill/references/api-spec.json

Tool-Based Integration

The agent functions without a dedicated computer environment and instead implements tools allowing models to trigger skills:

// Agent activates skill via tool call
{
  "tool": "activate_skill",
  "skill_name": "my-skill"
}

// Access bundled resources via tool
{
  "tool": "read_skill_asset",
  "skill_name": "my-skill",
  "asset_path": "scripts/process.py"
}

4. Creating Agent Skills: Complete Examples

4.1 Basic Skill Template

Here's a minimal SKILL.md file demonstrating the required structure:

---
name: skill-name
description: A clear, concise description of what this skill does and when to use it
---

# Skill Name

Brief overview of the skill's purpose.

## Instructions

Step-by-step instructions for the agent to follow:

1. First action
2. Second action
3. Third action

## Examples

Example scenarios showing the skill in action.

4.2 Production Example: API Design Skill

A real-world skill for REST API design best practices:

---
name: api-design
description: REST API design best practices and conventions. Use when designing or reviewing REST APIs.
license: Apache-2.0
metadata:
  author: API Standards Team
  version: 1.2.0
  updated: 2026-01-15
---

# API Design Guidelines

Follow these conventions when designing REST APIs to ensure consistency,
scalability, and developer-friendly interfaces.

## URL Structure

- Use plural nouns for resources: `/users`, `/orders`
- Use kebab-case for multi-word resources: `/order-items`
- Nest related resources: `/users/{id}/orders`
- Keep URLs shallow (max 3 levels deep)
- Avoid verbs in URLs (use HTTP methods instead)

## HTTP Methods

- **GET**: Retrieve resources (safe, idempotent)
- **POST**: Create new resources (not idempotent)
- **PUT**: Replace entire resource (idempotent)
- **PATCH**: Partial update (not necessarily idempotent)
- **DELETE**: Remove resource (idempotent)

## Response Codes

### Success Codes
- `200 OK`: Successful GET, PUT, PATCH, or DELETE
- `201 Created`: Successful POST (include Location header)
- `204 No Content`: Successful DELETE with no response body

### Client Error Codes
- `400 Bad Request`: Invalid request syntax or parameters
- `401 Unauthorized`: Missing or invalid authentication
- `403 Forbidden`: Valid auth but insufficient permissions
- `404 Not Found`: Resource doesn't exist
- `409 Conflict`: Request conflicts with current state
- `422 Unprocessable Entity`: Validation errors

### Server Error Codes
- `500 Internal Server Error`: Unexpected server error
- `503 Service Unavailable`: Temporary unavailability

## Request/Response Format

All requests and responses should use JSON with consistent structure:

```json
{
  "data": { ... },
  "meta": {
    "timestamp": "2026-01-30T10:30:00Z",
    "version": "1.0"
  },
  "errors": []
}
```

## Pagination

For list endpoints, use cursor-based pagination:

```json
{
  "data": [...],
  "pagination": {
    "next_cursor": "abc123",
    "prev_cursor": "def456",
    "has_more": true
  }
}
```

## Versioning

- Use URL versioning: `/v1/users`
- Maintain at least 2 versions concurrently
- Announce deprecation 6 months in advance

## Error Handling

Always return structured error responses:

```json
{
  "errors": [
    {
      "code": "VALIDATION_ERROR",
      "message": "Email address is invalid",
      "field": "email"
    }
  ]
}
```

## When Designing a New API

1. Identify all resources and their relationships
2. Define URL structure following conventions above
3. Map operations to HTTP methods
4. Design request/response schemas
5. Document all endpoints with examples
6. Review with API Standards Team

4.3 Production Example: Codebase Visualizer

A skill that generates interactive HTML tree visualizations of project structure:

---
name: codebase-visualizer
description: Generate an interactive collapsible tree visualization of your codebase. Use when exploring a new repo, understanding project structure, or identifying large files.
allowed-tools: Bash(python *)
compatibility: Requires Python 3.8+
---

# Codebase Visualizer

Generate an interactive HTML tree view that shows your project's file structure
with collapsible directories, file sizes, and syntax highlighting.

## Usage

Run the visualization script from your project root:

```bash
python ~/.claude/skills/codebase-visualizer/scripts/visualize.py .
```

This will generate `codebase-tree.html` in the current directory.

## Options

- `--exclude`: Patterns to exclude (default: node_modules, .git, __pycache__)
- `--max-depth`: Maximum directory depth (default: unlimited)
- `--output`: Output file name (default: codebase-tree.html)

## Example

```bash
python visualize.py . --exclude "*.pyc,dist,build" --max-depth 5
```

## Output

The generated HTML includes:
- Collapsible directory tree
- File size indicators
- File type icons
- Search functionality
- Dark/light theme toggle

4.4 Production Example: Git Release Management

A skill for creating consistent releases and changelogs:

---
name: git-release
description: Create consistent Git releases and changelogs. Use when preparing version releases, generating changelogs, or tagging releases.
allowed-tools: Bash(git *)
---

# Git Release Management

Creates consistent releases and changelogs by analyzing merged PRs,
proposing version bumps, and generating release notes.

## Semantic Versioning

Follow semantic versioning (MAJOR.MINOR.PATCH):

- **MAJOR**: Breaking changes (incompatible API changes)
- **MINOR**: New features (backward-compatible)
- **PATCH**: Bug fixes (backward-compatible)

## Release Process

1. **Analyze Changes**
   - Review commits since last release
   - Identify breaking changes, features, and fixes
   - Determine appropriate version bump

2. **Generate Changelog**
   - Group changes by type (Breaking, Features, Fixes)
   - Extract PR titles and numbers
   - Include contributor credits

3. **Create Release**
   - Update version in relevant files
   - Commit changelog
   - Create Git tag
   - Push tag to remote

4. **Publish Release Notes**
   - Use changelog content
   - Include upgrade instructions if breaking
   - Link to detailed documentation

## Changelog Format

```markdown
# Changelog

## [2.1.0] - 2026-01-30

### Breaking Changes
- Removed deprecated `oldMethod()` API (#123)

### Features
- Added new authentication flow (#124)
- Improved performance of data processing (#125)

### Bug Fixes
- Fixed memory leak in background worker (#126)
- Corrected timezone handling (#127)

### Contributors
@username1, @username2, @username3
```

## Commands

```bash
# Create a new release
git tag -a v2.1.0 -m "Release v2.1.0"
git push origin v2.1.0

# Generate changelog
git log v2.0.0..HEAD --pretty=format:"%s (%h)" --merges
```

4.5 Production Example: HR Questions Processing

An enterprise skill for handling HR-related queries:

---
name: hr-questions
description: Answers HR-related questions including policies, benefits, leave requests, onboarding, and employee guidelines. Use for any HR or people operations questions.
metadata:
  department: Human Resources
  sensitivity: confidential
  version: 3.0.0
---

# HR Questions Processing

Provides accurate, policy-compliant answers to employee HR questions
across benefits, policies, leave management, and onboarding.

## Question Categories

### Benefits
- Health insurance coverage and enrollment
- 401(k) contribution limits and matching
- PTO accrual and usage policies
- Parental leave policies
- Tuition reimbursement

### Policies
- Code of conduct
- Remote work policies
- Expense reimbursement
- Equipment policies
- Confidentiality agreements

### Leave Management
- Sick leave
- Vacation time
- Personal days
- FMLA eligibility
- Bereavement leave

### Onboarding
- New hire checklist
- First day procedures
- System access requests
- Training requirements

## Response Guidelines

1. **Always cite policy source**: Reference the specific policy document
2. **Include effective dates**: Policies may have changed
3. **Escalate sensitive issues**: Direct to HR for personal situations
4. **Maintain confidentiality**: Never share other employees' information
5. **Stay current**: Refer to resources/ directory for latest policies

## Example Interaction

**Question**: "How much PTO do I accrue per year?"

**Answer**:
According to the PTO Policy (effective 2026-01-01):
- 0-2 years: 15 days per year (1.25 days/month)
- 3-5 years: 20 days per year (1.67 days/month)
- 6+ years: 25 days per year (2.08 days/month)

PTO accrues monthly and can be used as soon as it's available.
Maximum carryover is 5 days per year.

For questions about your specific accrual, contact hr@company.com.

## Escalation Scenarios

Immediately direct to HR for:
- Harassment or discrimination concerns
- Performance improvement plans
- Termination questions
- Salary negotiations
- Medical accommodations
- Legal matters

5. Skill Discovery and Loading Mechanisms

5.1 Discovery Phase Architecture

Skill discovery is the foundation of efficient skill management. Modern agent systems implement sophisticated discovery mechanisms to index and surface relevant skills without overwhelming the context window.

Filesystem-Based Discovery

Agents scan designated skills directories during initialization:

# Common skill directory locations
~/.claude/skills/          # Claude Code
~/.cursor/skills/          # Cursor
~/.config/opencode/skills/ # OpenCode
./skills/                  # Project-specific skills

# Discovery process
1. Scan directories for SKILL.md files
2. Parse YAML frontmatter
3. Extract name + description
4. Build in-memory index
5. Total tokens: ~50 per skill

Tool-Based Discovery

For agents without filesystem access, discovery happens via dedicated tools:

{
  "tool": "list_skills",
  "response": [
    {
      "name": "api-design",
      "description": "REST API design best practices..."
    },
    {
      "name": "git-release",
      "description": "Create consistent releases..."
    }
  ]
}

5.2 Semantic Matching

The agent uses the skill descriptions for semantic matching against user requests.8AcademicHuggingGPT: Solving AI Tasks with ChatGPT and its FriendsShen et al., NeurIPS 2024View Paper This approach mirrors the task-routing pattern from multi-model orchestration research:

User Request
"Design a REST API"
Semantic Analysis
Embedding similarity
Match Skills
api-design: 0.92
Activate
Load full SKILL.md

Optimization Tip: Description Engineering

Craft skill descriptions to maximize semantic matching accuracy. Include key terms, synonyms, and use cases. For example: "REST API design best practices and conventions. Use when designing, reviewing, or documenting RESTful APIs, web services, or HTTP endpoints."

5.3 Loading Strategies

Eager Loading (Anti-Pattern)

Loading all skill instructions at startup - wasteful and slow:

// DON'T DO THIS
startup() {
  for skill in all_skills:
    load_full_instructions(skill)  // 100 skills × 3000 tokens = 300k tokens!
}

Lazy Loading (Recommended)

Load instructions only when skills are activated:

// RECOMMENDED APPROACH
startup() {
  for skill in all_skills:
    load_metadata_only(skill)  // 100 skills × 50 tokens = 5k tokens
}

on_user_request(request) {
  matching_skills = semantic_match(request, all_skills)
  for skill in matching_skills:
    load_full_instructions(skill)  // Only 1-3 skills typically
}

Predictive Preloading (Advanced)

Some systems preload likely-needed skills based on conversation context:

// ADVANCED: Predictive preloading
on_conversation_start(project_type) {
  if project_type == "web_app":
    preload(["api-design", "security-best-practices", "database-schema"])
}

5.4 Skill Priority and Ranking

When multiple skills match a request, agents use ranking to determine activation order:

Semantic Similarity

Primary ranking factor. Skills with descriptions most similar to the user request score highest.

Recency

Recently updated skills may be prioritized to ensure agents use current best practices.

Usage Frequency

Frequently used skills can be ranked higher based on historical activation patterns.

Explicit Prioritization

Some systems allow manual priority settings in metadata for critical organizational skills.

6. Skill Lifecycle Management

6.1 The Skill Development Lifecycle

Professional skill management requires a structured lifecycle approach.9IndustryAnnouncing skills on Tessl: the package manager for agent skillsTessl, January 2026View Source Emerging platforms now provide comprehensive lifecycle management tools:

1. Design

Identify knowledge gaps, define scope, write skill specification, determine required tools and resources.

2. Implementation

Write SKILL.md with metadata and instructions, create supporting scripts and resources, test with target agent platforms.

3. Evaluation

Test skill activation accuracy, validate instruction clarity, measure token efficiency, gather user feedback.

4. Deployment

Version control with Git, distribute to agent environments, document usage and examples, monitor activation patterns.

5. Monitoring

Track skill usage metrics, identify activation failures, collect agent performance data, assess outcome quality.

6. Optimization

Refine descriptions for better matching, improve instruction clarity, reduce token count, update for new best practices.

6.2 Versioning and Compatibility

Skills require versioning strategies to manage evolution over time:

Version Metadata

---
name: api-design
description: REST API design best practices
metadata:
  version: 2.1.0
  min_agent_version: 1.5.0
  updated: 2026-01-30
  changelog: Added GraphQL guidelines
---

Versioning Strategies

Directory Versioning

Maintain multiple versions in separate directories:

skills/
├── api-design-v1/
├── api-design-v2/
└── api-design-v3/

Git Branch Versioning

Use Git branches for version management:

main (latest stable)
v2.x (maintenance)
v1.x (deprecated)

Metadata Versioning

Single skill with version metadata, agent selects based on compatibility.

Backward Compatibility Challenge

There's currently no built-in versioning system in the Agent Skills specification. Organizations must implement their own versioning strategies. Best practice: Use semantic versioning in metadata and maintain at least one major version for backward compatibility.

6.3 Quality Assurance

Ensuring skill quality requires systematic evaluation:

Key Quality Metrics

Metric Target Measurement
Activation Accuracy > 90% Skill activated when appropriate
False Positive Rate < 5% Skill activated inappropriately
Token Efficiency < 5000 tokens Full SKILL.md token count
Instruction Clarity > 95% Agent follows instructions correctly
Outcome Quality > 85% Task completed successfully

6.4 Skill Deprecation

When skills become outdated or are superseded, follow a structured deprecation process:

---
name: old-api-design
description: DEPRECATED: Use api-design-v2 instead. Legacy REST API guidelines.
metadata:
  deprecated: true
  deprecation_date: 2026-01-01
  replacement: api-design-v2
  removal_date: 2026-07-01
---

# DEPRECATED: Old API Design

This skill is deprecated and will be removed on 2026-07-01.

**Use instead**: api-design-v2

This skill is maintained for backward compatibility only.

6.5 Enterprise Lifecycle Management Platforms

Several platforms have emerged to manage the full skill lifecycle:

Tessl: Skills Package Manager

In January 2026, Tessl announced a developer-grade package manager for skills,9IndustryAnnouncing skills on Tessl: the package manager for agent skillsTessl, January 2026View Source providing tools to evaluate quality, a registry of evaluated skills, and a platform to manage the full lifecycle: build, evaluate, distribute, and optimize.

Key Challenges Solved

  • Quality visibility
  • Knowledge staleness detection
  • Performance degradation tracking
  • Version management

Platform Features

  • Automated skill evaluation
  • Registry with quality scores
  • Distribution management
  • Performance monitoring

7. Skill Repositories and Marketplaces

7.1 Official Repositories

Anthropic Skills Repository

The official repository maintained by Anthropic serves as the reference implementation:

OpenAI Codex Skills

OpenAI adopted the Agent Skills standard in December 2025:

7.2 Community Marketplaces

SkillsMP - The Agent Skills Marketplace

Launched in December 2025, SkillsMP is the first comprehensive marketplace for agent skills:

71,000+
Skills Available
GitHub
Source Integration
Smart
Category Filtering
Quality
Score Indicators

Features

7.3 Notable Community Collections

Context Engineering Skills

github.com/muratcankoylan/Agent-Skills-for-Context-Engineering

Comprehensive collection for context engineering, multi-agent architectures, and production systems.

Awesome Agent Skills

github.com/skillmatic-ai/awesome-agent-skills

Curated list of high-quality agent skills across domains, with quality ratings and usage examples.

Enterprise Skills Collection

Partners include Atlassian (Jira/Confluence), Figma, Canva, Stripe, and Zapier.

Production-grade skills for enterprise integrations.

7.4 Platform Adoption

The Agent Skills standard has been adopted across major AI development platforms:

Platform Adoption Date Integration Level Skill Directory
Claude Code Oct 2025 Native ~/.claude/skills/
Cursor Dec 2025 Native ~/.cursor/skills/
VS Code Copilot Dec 2025 Native (Microsoft) .vscode/skills/
GitHub Copilot Dec 2025 Native (Microsoft) .github/skills/
OpenCode Dec 2025 Native ~/.opencode/skills/
Goose Dec 2025 Native ~/.goose/skills/
Amp Jan 2026 Native ~/.amp/skills/
Letta Jan 2026 Native ~/.letta/skills/

Industry Impact

The rapid adoption across platforms demonstrates the industry's recognition of Agent Skills as a unifying standard.15IndustryOpenAI Function Calling GuideOpenAI, 2024-2025View Source This cross-platform compatibility enables organizations to invest in skill development once and deploy everywhere.

8. Best Practices and Design Patterns

8.1 Description Engineering

The skill description is the most critical component - it determines when and how often your skill activates.

Bad Description

description: "API design"

Too vague, won't match user requests effectively.

Good Description

description: "REST API design best practices and conventions. Use when designing, reviewing, or documenting RESTful APIs."

Specific, includes key terms and use cases.

Description Best Practices

8.2 Common Design Patterns

Template Generation Pattern

Skills that generate standardized outputs from templates:

# Use cases: Report generation, boilerplate code, documentation

1. Gather required information from user
2. Select appropriate template from assets/
3. Populate template with user data
4. Validate output format
5. Present generated content

Iterative Analysis Pattern

Multi-pass processes with increasing depth:

# Use cases: Code review, security audits, quality analysis

1. Broad initial scan - identify areas of interest
2. Medium-depth analysis - examine flagged areas
3. Deep dive - detailed investigation of issues
4. Synthesis - compile findings into report

Sequential Pipeline Pattern

Linear, deterministic workflows where each step depends on the previous:

# Use cases: Data processing, CI/CD, deployment

1. Validation - ensure prerequisites met
2. Processing - execute core operations
3. Verification - confirm success
4. Cleanup - finalize and document

Decision Tree Pattern

Branching logic based on conditions:

# Use cases: Troubleshooting, configuration, routing

1. Assess initial conditions
2. Branch based on criteria:
   - If condition A: follow path 1
   - If condition B: follow path 2
   - Otherwise: follow default path
3. Execute path-specific instructions
4. Converge at output step

8.3 Token Optimization

Keep Instructions Concise

Target Limits

  • Metadata: ~50 tokens
  • Full SKILL.md: < 5,000 tokens
  • Recommended: < 500 lines

Optimization Techniques

  • Use bullet points over paragraphs
  • Remove redundant words
  • Reference external files for details
  • Use code examples sparingly

External Resources

For extensive documentation, use references/ directory:

my-skill/
├── SKILL.md              # Keep concise (~2000 tokens)
└── references/
    ├── detailed-guide.md # Full documentation
    └── examples.md       # Extensive examples

# In SKILL.md:
For detailed examples, see references/examples.md

8.4 Modularity and Composability

Single Responsibility

Each skill should have one clear purpose.11AcademicLIMA: Less Is More for AlignmentZhou et al., NeurIPS 2024View Paper Research shows that carefully curated skill content outperforms verbose instructions:

Skill Composition

Skills can reference other skills for complex workflows.12AcademicAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent ConversationWu et al., 2024View Paper Multi-agent frameworks demonstrate that skill distribution across agents enables emergent problem-solving capabilities:

---
name: full-api-review
description: Complete API review covering design, security, and documentation
---

# Full API Review

This skill orchestrates multiple specialized skills for comprehensive review:

1. Activate api-design skill
   - Review URL structure, HTTP methods, response codes

2. Activate api-security skill
   - Review authentication, authorization, data validation

3. Activate api-documentation skill
   - Review OpenAPI spec, examples, error documentation

4. Compile findings into unified report

8.5 Security Considerations

Trust and Verification

It's strongly recommended to use Skills only from trusted sources. Skills provide Claude with new capabilities through instructions and code, which makes them powerful but also means a malicious Skill can direct Claude to invoke tools or execute code in ways that don't match the Skill's stated purpose.

Security Best Practices

Using allowed-tools

---
name: safe-file-processor
description: Process files safely
allowed-tools: Bash(python scripts/process.py) Read Write
---

# This skill can only:
# - Run specific Python script
# - Read files
# - Write files
# Cannot: Execute arbitrary commands, access network, etc.

8.6 Testing and Validation

Test Scenarios

Before deploying a skill, test these scenarios:9IndustryAnnouncing skills on Tessl: the package manager for agent skillsTessl, January 2026View Source

  1. Activation Test: Does the skill activate for relevant queries?
  2. False Positive Test: Does it activate for irrelevant queries?
  3. Instruction Following: Does the agent follow instructions correctly?
  4. Error Handling: How does it handle edge cases and errors?
  5. Tool Integration: Do referenced tools work as expected?
  6. Cross-Platform: Does it work on all target agent platforms?

Validation Checklist

□ Name is unique and follows naming convention (lowercase, hyphens)
□ Description is clear, specific, and includes use cases
□ Instructions are concise and actionable
□ Code examples are syntactically correct
□ Referenced files exist in correct directories
□ Token count is under 5,000 for full SKILL.md
□ Tested on target agent platforms
□ Security review completed for scripts/
□ Documentation includes examples
□ Metadata includes version and author

9. Advanced Topics

9.1 Multi-Language Skills

Skills can support multiple languages for global teams:

multilingual-skill/
├── SKILL.md              # English (default)
├── SKILL.es.md          # Spanish
├── SKILL.fr.md          # French
└── SKILL.ja.md          # Japanese

# Agent selects based on user's language preference

9.2 Dynamic Skills

Some advanced systems generate skills dynamically based on runtime context.13AcademicCREATOR: Tool Creation for Disentangling Abstract and Concrete ReasoningWang et al., 2024View Paper Research demonstrates LLMs can create their own tools and skills as reusable knowledge artifacts:

// Generate project-specific skill from codebase analysis
analyze_codebase() {
  conventions = extract_patterns(codebase)
  generate_skill({
    name: "project-conventions",
    description: "Project-specific coding conventions",
    instructions: conventions
  })
}

9.3 Skill Analytics

Track skill performance metrics for optimization:

Metric Purpose Action Threshold
Activation Rate How often skill is used < 1/month → Consider deprecation
Success Rate Task completion percentage < 80% → Refine instructions
False Positive Rate Inappropriate activations > 10% → Improve description
Average Token Usage Context efficiency > 5000 → Optimize content
User Satisfaction Quality perception < 4/5 → Review and update

9.4 Enterprise Skill Governance

Governance Framework

Organizations deploying skills at scale need governance frameworks to ensure quality, security, and compliance.

Governance Components

Skill Registry

  • Centralized catalog
  • Approval workflows
  • Access controls
  • Compliance tracking

Quality Gates

  • Automated testing
  • Security scanning
  • Performance benchmarks
  • Peer review

Lifecycle Management

  • Version control
  • Deprecation policies
  • Update notifications
  • Rollback procedures

Monitoring & Audit

  • Usage analytics
  • Performance metrics
  • Compliance logs
  • Security audits

9.5 Future Developments

Q1 2026

Enhanced metadata standards for better discovery, automated quality scoring, cross-platform skill marketplaces.

Q2 2026

Skill composition frameworks, versioning APIs, automated testing tools, performance benchmarking suites.

Q3 2026

AI-assisted skill generation, dynamic skill optimization, advanced analytics dashboards, enterprise governance platforms.

Q4 2026

Industry-specific skill libraries, federated skill registries, standardized evaluation frameworks, certification programs.

10. Conclusion and Key Takeaways

10.1 Summary

Agent Skills represent a fundamental advancement in how we extend AI agent capabilities.14IndustryLangChain Tools DocumentationLangChain, 2024-2025View Source By providing a standardized, efficient, and portable format for packaging procedural knowledge, they enable:

10.2 Critical Success Factors

1. Description Quality

Invest time in crafting precise descriptions that maximize semantic matching accuracy.

2. Single Responsibility

Keep skills focused on one clear purpose for better composability and maintainability.

3. Token Efficiency

Optimize content to stay under 5,000 tokens while maintaining clarity.

4. Lifecycle Management

Implement versioning, monitoring, and governance for production deployments.

10.3 Getting Started

Quick Start Checklist

  1. Identify a knowledge gap or repeated workflow in your organization
  2. Create a skill directory with SKILL.md using the basic template
  3. Write a clear, specific description with use cases
  4. Document step-by-step instructions in the markdown body
  5. Test activation with relevant queries on your target platform
  6. Refine based on activation accuracy and outcome quality
  7. Deploy to your team and monitor usage metrics
  8. Iterate based on feedback and performance data

10.4 Resources

The Future of Agent Skills

With 71,000+ community-created skills, adoption by major platforms (Microsoft, OpenAI, Anthropic, Cursor), and emerging lifecycle management platforms, Agent Skills are positioned to become the standard way organizations package and share AI agent knowledge. The separation of "know-how" from "can-do" enables building agents that are not just powerful, but reliable, compliant, and efficient - essential characteristics for enterprise deployment.

Implementation Examples

Practical Claude Code patterns for implementing the skill concepts from this section. These examples demonstrate custom agent definitions, skill composition, and tool restriction patterns based on the in-context learning research.3AcademicAn Explanation of In-Context Learning as Implicit Bayesian InferenceXie et al., 2024View Paper

Custom Agent Definition

Define specialized agents with focused prompts and tool sets. This implements the skill separation pattern described in Section 2, where "know-how" (the prompt) is separated from "can-do" (the tools).10AcademicSkill-it: A Data-Driven Skills FrameworkChen et al., 2024View Paper

Python
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

# Define a specialized code review agent
code_reviewer = AgentDefinition(
    description="Expert code reviewer for security and quality analysis.",
    prompt="""You are a senior security engineer conducting code review.
Focus on:
- Security vulnerabilities (injection, XSS, auth issues)
- Performance bottlenecks
- Code quality and maintainability
Provide actionable, specific recommendations.""",
    tools=["Read", "Glob", "Grep"]  # Read-only access
)

async for message in query(
    prompt="Use the code-reviewer agent to analyze src/auth/",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Glob", "Grep", "Task"],
        agents={"code-reviewer": code_reviewer}
    )
):
    if hasattr(message, "result"):
        print(message.result)
TypeScript
import { query, AgentDefinition } from "@anthropic-ai/claude-agent-sdk";

// Define a specialized code review agent
const codeReviewer: AgentDefinition = {
  description: "Expert code reviewer for security and quality analysis.",
  prompt: `You are a senior security engineer conducting code review.
Focus on: security vulnerabilities, performance, code quality.
Provide actionable, specific recommendations.`,
  tools: ["Read", "Glob", "Grep"]
};

for await (const message of query({
  prompt: "Use the code-reviewer agent to analyze src/auth/",
  options: {
    allowedTools: ["Read", "Glob", "Grep", "Task"],
    agents: { "code-reviewer": codeReviewer }
  }
})) {
  if ("result" in message) console.log(message.result);
}

Skill Composition: Multiple Specialized Agents

Compose multiple skills for complex workflows, demonstrating the multi-agent coordination research findings.12AcademicAutoGen: Multi-Agent ConversationWu et al., 2024View Paper

Python
from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

# Skill 1: Documentation expert
doc_writer = AgentDefinition(
    description="Technical documentation specialist.",
    prompt="Write clear, comprehensive documentation for code and APIs.",
    tools=["Read", "Write", "Glob"]
)

# Skill 2: Test writer
test_writer = AgentDefinition(
    description="Unit test and integration test specialist.",
    prompt="Write comprehensive tests with edge cases and mocking.",
    tools=["Read", "Write", "Bash"]
)

# Compose skills: orchestrator delegates to specialists
async for message in query(
    prompt="""For the auth module:
1. Use doc-writer to document the public API
2. Use test-writer to add missing unit tests""",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Write", "Glob", "Bash", "Task"],
        agents={
            "doc-writer": doc_writer,
            "test-writer": test_writer
        }
    )
):
    pass

Tool Restriction Patterns

Limit tools to create focused, safe agents. This pattern follows the principle of least privilege discussed in security-conscious skill design.

Python
from claude_agent_sdk import query, ClaudeAgentOptions

# Read-only analysis agent (safe for production code)
async for message in query(
    prompt="Analyze the database schema and identify optimization opportunities",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Glob", "Grep"],  # No Write/Edit/Bash
        permission_mode="default"  # Still prompts for approval
    )
):
    pass

# Write-enabled agent with approval workflow
async for message in query(
    prompt="Refactor utils.py to improve error handling",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Edit", "Glob"],  # Edit but no Write (safer)
        permission_mode="acceptEdits"  # Auto-approve edits
    )
):
    pass

GSD Skill Integration

The GSD workflow system implements skills through structured markdown files, following the composable workflow patterns from academic research.8AcademicHuggingGPT: Solving AI Tasks with ChatGPTShen et al., 2024View Paper

Bash
# GSD skills are defined in .claude/get-shit-done/
# Each workflow is a skill with specific triggers and outputs

# Trigger the planning skill
claude "/gsd:initialize"

# Execute a specific plan (uses plan execution skill)
claude "/gsd:execute-phase .planning/phases/01-setup/01-01-PLAN.md"

# Skills can be composed: planning -> execution -> summary

GSD Agent Inventory

GSD implements skill composition patterns from academic research through its agent definition system. Each agent specifies identity via name/description metadata (~50 tokens loaded at discovery), capability boundaries via tools list, and procedural knowledge via Markdown body (loaded when activated).

GSD Agent Purpose Key Pattern Research Mapping
gsd-executor Execute plans with atomic commits Deviation rules, checkpoints ReAct bounded autonomy
gsd-verifier Goal-backward verification Must-have checking Outcome-focused planning12
gsd-planner Create executable plans Task breakdown, dependency analysis Tool learning decomposition4
gsd-phase-researcher Domain research Context7 first, verify before asserting Retrieval-augmented generation

Enhancement Ideas

References

Research current as of: January 2026

Academic Papers

  1. Xie, S. M., Raghunathan, A., Liang, P., & Ma, T. (2024). "An Explanation of In-Context Learning as Implicit Bayesian Inference." ICLR 2024. arXiv
  2. Qin, Y., Hu, S., Lin, Y., et al. (2024). "Tool Learning with Foundation Models." ACL 2024. arXiv
  3. Schick, T., Dwivedi-Yu, J., Dessi, R., et al. (2024). "Toolformer: Language Models Can Teach Themselves to Use Tools." NeurIPS 2024. arXiv
  4. Wei, J., Hou, L., Lampinen, A., et al. (2024). "Symbol Tuning Improves In-Context Learning in Language Models." NeurIPS 2024. arXiv
  5. Dziri, N., Lu, X., Sclar, M., et al. (2024). "Faith and Fate: Limits of Transformers on Compositionality." NeurIPS 2024. arXiv
  6. Shen, Y., Song, K., Tan, X., et al. (2024). "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face." NeurIPS 2024. arXiv
  7. Chen, M., Tworek, J., Jun, H., et al. (2024). "Skill-it: A Data-Driven Skills Framework for Understanding and Training Language Models." NeurIPS 2024. arXiv
  8. Zhou, C., Liu, P., Xu, P., et al. (2024). "LIMA: Less Is More for Alignment." NeurIPS 2024. arXiv
  9. Wu, Q., Bansal, G., Zhang, J., et al. (2024). "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation." arXiv
  10. Wang, Z., Zhang, J., & Chen, D. (2024). "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models." arXiv
  11. Hong, S., Zhuge, M., Chen, J., et al. (2024). "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework." ICLR 2024. arXiv

Industry Sources

  1. Anthropic. "Equipping agents for the real world with Agent Skills." Engineering Blog, October 2025. View
  2. Arcade AI. "Skills vs Tools for AI Agents: Production Guide." January 2026. View
  3. Tessl. "Announcing skills on Tessl: the package manager for agent skills." January 2026. View
  4. LangChain. "Tools Documentation." Framework Documentation, 2024-2025. View
  5. OpenAI. "Function Calling Guide." API Documentation, 2024-2025. View

Additional Sources

This comprehensive guide was also compiled from the following sources: