Research current as of: January 2026
AI agentic programming represents a transformative paradigm shift in artificial intelligence, moving beyond simple question-answering systems to autonomous agents capable of planning, executing complex tasks, and interacting with external tools and environments.1AcademicA Survey on Large Language Model based Autonomous AgentsView Paper Unlike traditional AI systems that operate within predetermined parameters, agentic AI systems are goal-driven, self-adjusting, and capable of learning from interactions.2AcademicThe Rise and Potential of Large Language Model Based Agents: A SurveyView Paper
AI agentic programming is an emerging paradigm where Large Language Models (LLMs) autonomously plan, execute, and interact with external tools to achieve user-defined goals.3AcademicLLM Powered Autonomous AgentsView Post Rather than following fixed rules, agentic AI systems think through problems, take actions using tools, observe results, and iteratively decide what to do next until the objective is achieved.
The year 2025 marked a pivotal transition point where AI agents moved from experimental prototypes to production-ready autonomous systems. By 2028, agentic AI is expected to drive 33% of enterprise software applications, up from just 1% in 2024. Furthermore, 85% of customer interactions are expected to be managed by AI by 2026, according to Gartner predictions.
Understanding the fundamental differences between agentic AI and traditional automation is critical for determining when to apply each approach. While both seek to improve efficiency and reduce manual work, they operate on fundamentally different principles.
| Dimension | Traditional Automation | Agentic AI |
|---|---|---|
| Decision-Making | Follows predetermined rules and fixed workflows. Excel at specific, well-defined tasks within predetermined parameters. | Autonomous, goal-driven systems that process information and act independently to reach set goals. Can set goals, learn from experiences, and adapt actions to fit larger objectives. |
| Adaptability | Static workflows requiring human intervention to handle exceptions. Built with RPA, workflow engines, or low-code platforms following predefined instructions. | Dynamic and adaptive, can navigate ambiguity, self-heal when encountering errors, and optimize performance without human intervention. |
| Scope | Automates single repetitive tasks (e.g., invoice classification, data entry). | Manages entire workflows end-to-end (e.g., receiving invoice, validating, reconciling with purchase orders, triggering payments, escalating exceptions). |
| Human Role | Requires constant supervision. Humans always "in-the-loop" for quality control. | Humans shift to supervisory role, setting objectives and guardrails while AI handles execution. |
| Learning | No learning capability. Must be explicitly reprogrammed to handle new scenarios. | Self-adjusting and capable of learning from interactions. Monitors performance, detects anomalies, and optimizes continuously. |
A 2025 analysis found that companies deploying AI agents report operational efficiency gains exceeding 50% and cost reductions of around 35%. A study by AskUI found that companies using agentic AI saw a 30% reduction in customer support queries due to the AI's ability to navigate ambiguity and escalate issues intelligently.
Traditional Automation: Best for repeatable, tightly regulated processes where consistency and compliance matter most. Use when workflows are structured, consistent, and predictable.
Agentic AI: Better for work that benefits from adaptability, judgment, and autonomy. Use for complex, multi-step tasks requiring decision-making across variable conditions.
Hybrid Approach: The consensus in 2025 is that the real advantage comes from knowing how to blend the two approaches, not choosing one over the other.
Agentic AI systems are built on several foundational architectural components and patterns. Understanding these patterns is essential for designing effective agent systems.
Every agentic AI system comprises several essential modules:7AcademicCognitive Architectures for Language Agents (CoALA)View Paper
A single-agent system uses an AI model, a defined set of tools, and a comprehensive system prompt to autonomously handle a user request. The agent relies on the model's reasoning capabilities to interpret requests, plan steps, and decide which tools to use.
For early agent development, starting with a single-agent system allows focus on refining core logic, prompts, and tool definitions before adding more complex architectural components. Single-agent architectures work well for tasks with clear, linear workflows.
A multi-agent system orchestrates multiple specialized agents to solve complex problems by decomposing large objectives into smaller sub-tasks and assigning each to a dedicated agent with specific skills.9AcademicAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent ConversationView Paper Agents then interact through collaborative or hierarchical workflows.
Multi-agent patterns provide a modular design that can improve scalability, reliability, and maintainability compared to a single agent with a monolithic prompt. The adoption of multi-agent architectures has exploded, with 72% of enterprise AI projects now involving multi-agent systems, up from 23% in 2024.
This pattern supports layered structured thought processes, where high-level agents handle long-term planning and strategic decision-making, while low-level agents manage real-time data handling and tactical execution. This mirrors organizational structures in human enterprises.
Hybrid models combine aspects of single-agent, multi-agent, and hierarchical designs to maximize flexibility, integrating multiple architectural paradigms to enable context-switching, diverse task handling, and enhanced dynamic awareness.
A single agent takes the lead, receives a trigger, breaks the task into sub-tasks, delegates each to a specialized agent, and ensures agents run in the right order with proper context and output flow.
Use case: Complex workflows requiring centralized coordination and quality control.
Each agent has its own tools and communicates directly with others to coordinate tasks, with no lead agent. Agents operate as peers in a distributed system.
Use case: Decentralized systems where no single point of control is desired.
A main planner breaks tasks into subtasks for specialized agents. This is the most common enterprise pattern, implemented by frameworks like CrewAI.
Use case: Business processes with clear task decomposition and specialization needs.
Agents share outputs iteratively and refine each other's results through feedback loops, mimicking human team collaboration.
Use case: Creative tasks, problem-solving, and scenarios requiring diverse perspectives.
Selecting the right architecture depends on the task at hand, the environment the agent operates in, and the level of autonomy required. Each architecture pattern represents a different approach to solving key challenges in agent design: coordination, specialization, scalability, control flow, and human collaboration.
The ReAct (Reasoning and Acting) pattern is the foundational design pattern for modern AI agents, combining chain-of-thought reasoning with the ability to take actions through tools.4AcademicReAct: Synergizing Reasoning and Acting in Language ModelsView Paper This enables AI systems to solve complex problems autonomously by creating a feedback loop between thinking and doing.
ReAct is built on the intuition that by creating a feedback loop between the assistant and the tools, complex questions can be answered with better accuracy.4AcademicReAct: Synergizing Reasoning and Acting in Language ModelsView Paper Instead of following fixed rules, ReAct agents think through problems, take actions like searching or running code, observe the results, and then decide what to do next.
The loop continues until the agent has sufficient information to provide a final answer
The agent analyzes the user's request and current context to determine what action to take next. It reasons about which tools might help and what information is still needed. This leverages the LLM's chain-of-thought capabilities to break down complex problems.
Based on its reasoning, the agent selects and invokes a specific tool with appropriate arguments. Actions might include searching the web, querying a database, performing calculations, calling an API, or executing code. The agent must correctly format the tool request based on the tool's schema.
The agent receives and processes the tool's output, incorporating new information into its understanding. It then decides whether to take another action (returning to the Reasoning phase) or provide a final answer to the user.
As of 2025, LangChain's development team recommends using LangGraph for all new ReAct agent implementations.16IndustryLangGraph: Building Stateful AI AgentsView Docs While LangChain agents continue to be supported, LangGraph offers a more flexible and production-ready architecture for complex workflows.
LangGraph implements ReAct through a graph-based state machine with the following components:
Beyond LangGraph, several platforms have emerged with built-in ReAct support:
Strengths:
Challenges:
ReAct patterns are particularly well-suited for:
Tool calling (also referred to as function calling) is the cornerstone capability that enables AI agents to move beyond conversation and take action in the real world.5AcademicToolformer: Language Models Can Teach Themselves to Use ToolsView Paper This pattern focuses on "doing" rather than just "knowing," transforming LLMs from passive responders into active participants in complex workflows.
Tool calling is when an AI agent uses an external function or API to complete a task that the model cannot do on its own.6AcademicTool Learning with Foundation ModelsView Paper Rather than generating text responses, the agent identifies the need for external capabilities, selects appropriate tools, constructs valid requests, and processes the results.
Tool use requires function/tool schemas with detailed definitions that enable the LLM to understand what tools are available and how to construct valid requests. A typical tool schema includes:
These schemas are often based on the OpenAPI specification, enabling standardization across different agent frameworks and platforms.
The agent makes direct HTTP requests to APIs using libraries like requests or axios.
Pros: Lower latency for simple requests, full control over implementation
Best for: Quick prototypes, connecting to 1-2 well-documented APIs
Model-native pattern popularized by OpenAI, Google, and Anthropic where you define available "tools" with structured schemas.
Pros: Native LLM support, standardized approach
Best for: Production agents with multiple tools
A single, standardized API for an entire category of software (e.g., all CRMs), with the platform handling translation to specific providers.
Pros: Reduced integration complexity, multi-provider support
Best for: Apps connecting to multiple SaaS providers in the same category
Emerging standard for tool discovery and execution, enabling standardized connections between AI models and external tools.
Pros: Industry standard, extensive ecosystem
Best for: Enterprise agents requiring broad tool compatibility
Decentralized pattern enabling autonomous agents to communicate and delegate tasks directly to one another.
Pros: Enables multi-agent collaboration at scale
Best for: Sophisticated multi-agent systems with autonomous coordination
2025 was marked as the year of AI agents, with organizations deploying AI agents widely across industries and functions. Popular frameworks with robust tool calling support include:
The Model Context Protocol has emerged as the de facto standard for connecting AI systems to real-world data and tools.14IndustryModel Context Protocol (MCP)View Spec Twelve months after its launch in November 2024, MCP has been adopted by OpenAI, Google DeepMind, Microsoft, and thousands of developers building production agents.
The Model Context Protocol is an open standard and open-source framework introduced by Anthropic to standardize the way artificial intelligence systems like large language models (LLMs) integrate and share data with external tools, systems, and data sources.14IndustryModel Context Protocol (MCP)View Spec MCP enables developers to build secure, two-way connections between their data sources and AI-powered tools.
Instead of building custom integrations for every LLM-to-tool connection, MCP provides a universal protocol. Build an MCP server once, and any MCP-compatible client (Claude, ChatGPT, Gemini, custom agents) can use it.
The MCP architecture is straightforward and follows a client-server model:
The MCP ecosystem has experienced remarkable growth since its introduction:
MCP has been adopted by major AI platforms and enterprise infrastructure providers:
At Block, MCP tools help refactor legacy software, migrate databases, run unit tests, and automate repetitive coding tasks. Design, product, and customer support teams use MCP-powered Goose to generate documentation, process tickets, and build prototypes.
Data teams rely on MCP to connect with internal systems, with integrations to Snowflake, Jira, Slack, Google Drive, and internal task-specific APIs. Thousands of Block's employees use Goose and cut up to 75% of the time spent on daily engineering tasks.
At the MCP Developer Summit, Sabhav Kothari, Head of AI Productivity at Bloomberg, focused on how his team utilizes MCP internally to help AI developers reduce the time required to ship demos into production.
Cloudflare transforms MCP from a local-only technology into a scalable, cloud-based solution. By hosting MCP servers in the cloud, Cloudflare removes the need for users to configure and maintain servers locally, enabling enterprise-scale deployments.
MCP connects Electronic Health Records (EHRs), symptom checkers, and diagnostic tools. AI assistants suggest treatment using real-time vitals plus patient history, resulting in safer, faster clinical decisions.
Risk analysis requires multi-model inputs (credit, fraud, behavioral data). MCP aggregates inputs from multiple data sources. AI bankers can review credit scores, transactions, and fraud alerts in one session, leading to smarter financial advice and approvals.
MCP servers connect CRM, market intelligence platforms, content management systems, and internal knowledge bases. Sales teams typically see improvements in conversion rates (10-25% increases), deal velocity, and more accurate sales forecasting.
MCP Servers can be leveraged by CISOs, SOC teams, red teamers, and threat intelligence analysts to automate real-world cybersecurity scenarios, integrating threat intelligence feeds, SIEM systems, and security tools.
75% of companies are planning to implement MCP in the next year, signaling widespread enterprise adoption. The donation of MCP to the Linux Foundation's Agentic AI Foundation positions it as a long-term industry standard, similar to how other open protocols have shaped technology ecosystems.
While the Model Context Protocol focuses on how agents use tools, the Agent2Agent Protocol (A2A) addresses how agents communicate with each other.15IndustryAgent2Agent Protocol (A2A)View Announcement Launched by Google in April 2025, A2A provides a universal standard for agent-to-agent communication, enabling agents built on different vendors, frameworks, or clouds to find, communicate, and collaborate securely.
The Agent2Agent (A2A) protocol allows AI agents to communicate with each other, securely exchange information, and coordinate actions across various enterprise platforms or applications. A2A provides a decentralized pattern enabling autonomous agents to delegate tasks directly to one another, forming the foundation for sophisticated multi-agent systems.
Agent Cards are JSON-format documents that describe what an agent can do, similar to an API specification. They include:
A2A is oriented towards task completion with defined lifecycles, supporting both:
The protocol is built on top of existing standards including HTTP, SSE (Server-Sent Events), and JSON-RPC, making it easier to integrate with existing IT stacks without requiring entirely new infrastructure.
While Anthropic's Model Context Protocol focused on how agents use tools, Agent2Agent addressed how agents communicate with each other. Crucially, the two protocols were designed to work together:
A customer service agent (on one platform) can discover and delegate to a billing agent (on another platform) to process a refund, even if they were built by different vendors using different frameworks.
Inventory agents, shipping agents, and demand forecasting agents from different organizations can coordinate in real-time to optimize supply chain operations.
Healthcare providers can allow their AI agents to securely communicate with pharmacy agents, insurance verification agents, and lab result agents while maintaining data privacy and sovereignty.
The donation of A2A to the Linux Foundation signals a commitment to open, vendor-neutral governance. This follows the pattern established by MCP's donation to the Agentic AI Foundation, creating a cohesive ecosystem of open protocols for the agentic AI era.
The combination of MCP (tool use) and A2A (agent communication) provides a complete foundation for building interoperable, scalable multi-agent systems across organizational and platform boundaries.
Beyond MCP and A2A, several other protocols and standards have emerged to address specific aspects of agent communication and coordination.
IBM's Agent Communication Protocol (ACP) was an open, vendor-neutral standard under the Linux Foundation for standardizing communication between AI agents. ACP defined RESTful, HTTP-based interfaces for task invocation, lifecycle management, and both synchronous/asynchronous messaging.
However, ACP has merged with A2A under the Linux Foundation umbrella, with the ACP team winding down active development and contributing its technology and expertise to A2A. This consolidation reduces fragmentation and strengthens the A2A ecosystem.
A modern communication standard, AGP acts as a gateway for secure, high-throughput messaging between distributed agents. Key features include:
Use case: High-frequency trading systems, real-time IoT agent networks, gaming AI agents
Academic research has identified critical directions for next-generation protocols:
According to Gartner's 2025 research, 40% of enterprise applications will integrate AI agents by 2026, yet communication barriers remain the primary cause of implementation failures. Organizations using standardized protocols reduce integration time by 60-70% compared to custom development.
The Linux Foundation's creation of the Agentic AI Foundation in late 2025 signals an effort to establish shared standards and best practices that could play a role similar to the World Wide Web Consortium (W3C) in shaping an open, interoperable agent ecosystem.
The convergence around MCP and A2A as primary standards, backed by major technology companies and governed by the Linux Foundation, suggests the industry is moving toward a consolidated, interoperable ecosystem rather than competing proprietary protocols.
Recent research has identified a collection of 12 agentic design patterns categorized as Foundational, Cognitive & Decisional, Execution & Interaction, and Adaptive & Learning. These patterns offer reusable, structural solutions to recurring problems in agent design.
The planning-agent pattern is useful for tasks requiring coordinated action sequences.11AcademicUnderstanding the Planning of LLM Agents: A SurveyView Paper The agent breaks down complex goals into step-by-step plans, evaluates alternative approaches, and executes actions in a logical sequence.
Example: A travel booking agent that plans multi-city itineraries by breaking down the task into flight search, hotel booking, ground transportation, and activity scheduling.
The reflection agent is useful for tasks requiring improvement over time.10AcademicReflexion: Language Agents with Verbal Reinforcement LearningView Paper The agent stores results, compares them to goals, evaluates performance, and updates its strategy based on outcomes.
Example: A code review agent that learns from accepted vs. rejected suggestions to improve its recommendations over time.
As covered extensively in previous sections, this pattern enables agents to extend their capabilities by invoking external functions, APIs, and services. This is the foundation of practical agent systems.
Combining reasoning and acting in an iterative loop, as detailed earlier. This is the most widely adopted foundational pattern.
Multiple specialized agents work together, each contributing unique expertise. Coordination can be hierarchical (supervisor-worker), peer-to-peer (network), or collaborative (shared workspace).
Example: A software development team of agents including a requirements analyst, architect, coder, tester, and deployment specialist.
Tasks are processed through a defined sequence of steps, with each step potentially involving different agents or tools. This pattern ensures consistency and auditability.
Example: Document approval workflows in enterprises, where documents flow through drafting → review → approval → publication stages.
Critical decisions or uncertain situations trigger human involvement. The agent recognizes its limitations and requests human guidance when confidence is low or stakes are high.
Example: Medical diagnosis agents that provide recommendations but require physician approval before treatment.
A novel dual-paradigm framework categorizes agentic systems into two distinct lineages:
Relies on algorithmic planning, persistent state, and explicit knowledge representation. These systems use traditional AI techniques like search algorithms, logical reasoning, and rule-based systems.
Strengths: Deterministic, explainable, verifiable
Examples: Classical robotics, expert systems, game-playing AI
Leverages stochastic generation, prompt-driven orchestration, and emergent capabilities of large language models. These systems use neural networks for reasoning and decision-making.
Strengths: Flexible, handles ambiguity, natural language interaction
Examples: LLM-based agents, conversational AI, generative workflows
Modern agentic systems increasingly combine both paradigms, using LLMs for high-level reasoning and natural language understanding while employing classical algorithms for precise computation, planning, and constraint satisfaction.
Memory has emerged as a core capability of foundation model-based agents.8AcademicGenerative Agents: Interactive Simulacra of Human BehaviorView Paper However, traditional taxonomies such as long/short-term memory have proven insufficient to capture the diversity of contemporary agent memory systems. Modern agents employ sophisticated memory architectures that enable them to maintain context, learn from experience, and build persistent knowledge.
Short-term memory provides an agent with immediate context, current conversation state, and prior exchanges within that session. The most common form of short-term memory is working memory—an active, temporary context accessible during a session.
While short-term memory captures immediate context, the real challenge lies in transforming these interactions into persistent, actionable knowledge that spans across sessions. Long-term memory enables agents to remember and build on previous interactions rather than starting fresh each time.
The extraction process supports three built-in memory strategies:
Extracts facts, concepts, and general knowledge. This is context-independent information that remains true across interactions.
Example: "User prefers Python over JavaScript," "Company uses AWS for cloud infrastructure"
Tailors towards individual preferences, habits, and user-specific patterns.
Example: "User prefers concise answers," "Typically works on backend systems"
Distills complex information and long interactions into concise summaries for better context management.
Example: Summarizing a month-long project discussion into key decisions and action items
Mem0 provides a scalable framework for building production AI agents with sophisticated long-term memory. Features include parallel processing (multiple memory strategies process independently), automatic memory decay to prevent bloat, and cross-session knowledge persistence.
Zep implements memory as a temporal knowledge graph, capturing not just facts but also relationships, temporal sequences, and contextual connections between information.
A self-organizing memory operating system for structured long-horizon reasoning, enabling agents to autonomously organize and structure their memory over time.
A scalable agentic memory framework specifically designed for personalized conversational AI, managing user-specific context across extended conversations.
It's crucial to decay stored memories in AI systems to prevent memory bloat and maintain efficiency. As an AI agent interacts over time, it accumulates a massive amount of information, some of which becomes irrelevant or outdated.
Modern memory systems implement:
In multi-agent systems, shared memory enables coordination. Agents can:
LangGraph + MongoDB, Redis, and other frameworks provide integrated memory solutions:
Three major frameworks have emerged as leaders in the multi-agent space: LangGraph, CrewAI, and AutoGen.9AcademicAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent ConversationView Paper Each takes a different architectural approach to agent collaboration.
LangGraph adopts a graph-based workflow design that treats agent interactions as nodes in a directed graph, providing exceptional flexibility for complex decision-making pipelines with conditional logic, branching workflows, and dynamic adaptation.16IndustryLangGraph: Building Stateful AI AgentsView Docs
CrewAI's strength lies in its intuitive approach to agent coordination and built-in support for common business workflow patterns. CrewAI prioritizes role-based team coordination, excelling in scenarios where multiple agents collaborate.
AutoGen focuses on conversational agent architecture, emphasizing natural language interactions and dynamic role-playing. It excels at creating flexible, conversation-driven workflows where agents can adapt their roles based on context.
| Use Case | Recommended Framework | Rationale |
|---|---|---|
| Precise control over workflow logic | LangGraph | Explicit state machines with branching control |
| Business process automation | CrewAI | Role-based coordination matches organizational structures |
| Conversational multi-agent systems | AutoGen | Natural language interactions between agents |
| RAG-centric applications | LlamaIndex | Optimized for retrieval and data integration |
| OpenAI-native development | OpenAI Agents | Seamless integration with OpenAI models and APIs |
72% of enterprise AI projects now involve multi-agent architectures, up from 23% in 2024. The shift from single agents to orchestrated multi-agent AI workflows is accelerating across marketing, SaaS, and e-commerce verticals.
The fundamentals of AI agentic programming rest on a foundation of well-established patterns (ReAct, Tool Use, Planning, Reflection), emerging standards (MCP, A2A), and sophisticated architectural approaches (single-agent, multi-agent, hierarchical, hybrid). The rapid standardization around MCP and A2A, backed by major technology companies and the Linux Foundation, signals a maturing ecosystem moving toward interoperability and production readiness.
Key takeaways:
As we move into Section 2, we'll explore how these fundamental patterns are packaged into reusable, modular "skills" that enable rapid agent development and deployment.
Practical Claude Code patterns demonstrating the fundamental concepts from this section. These examples use the Claude Agent SDK to implement ReAct patterns, tool use, and basic agent queries.
The simplest agent pattern: a single query with tool access. This implements the foundation described in the ReAct research4AcademicReAct: Synergizing Reasoning and Acting in Language ModelsView Paper, where reasoning and acting are interleaved.
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
async def main():
# Basic ReAct loop: reason about task, act with tools, observe results
async for message in query(
prompt="Analyze the authentication module and suggest improvements",
options=ClaudeAgentOptions(
allowed_tools=["Read", "Glob", "Grep"],
permission_mode="default"
)
):
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())
import { query } from "@anthropic-ai/claude-agent-sdk";
// Basic ReAct loop: reason about task, act with tools, observe results
for await (const message of query({
prompt: "Analyze the authentication module and suggest improvements",
options: {
allowedTools: ["Read", "Glob", "Grep"],
permissionMode: "default"
}
})) {
if ("result" in message) console.log(message.result);
}
Demonstrates the think-act-observe loop described in Section 1, showing how Claude reasons about each step before taking action. This pattern forms the foundation for all agentic behavior.4AcademicReAct: Synergizing Reasoning and Acting in Language ModelsView Paper
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
async def react_with_streaming():
"""ReAct pattern with visible reasoning and tool execution."""
async for message in query(
prompt="Debug why user login is failing in auth.py",
options=ClaudeAgentOptions(
allowed_tools=["Read", "Grep", "Bash"],
permission_mode="acceptEdits"
)
):
# Stream reasoning (think)
if hasattr(message, "content"):
print(f"Thinking: {message.content}")
# Show tool calls (act)
if hasattr(message, "tool_use"):
print(f"Action: {message.tool_use.name}")
# Display observations
if hasattr(message, "tool_result"):
print(f"Observation: {message.tool_result[:200]}...")
asyncio.run(react_with_streaming())
This example shows how to define custom tools following the patterns described in the Tool Learning research.6AcademicTool Learning with Foundation ModelsView Paper Claude selects appropriate tools based on the task.
from claude_agent_sdk import query, ClaudeAgentOptions
# Claude Code's built-in tools implement the tool use pattern
# Read, Write, Edit, Bash, Grep, Glob - each with clear schemas
async for message in query(
prompt="Find all TODO comments in the codebase and create a summary",
options=ClaudeAgentOptions(
# Tool selection based on task requirements
allowed_tools=["Grep", "Glob", "Read", "Write"],
permission_mode="acceptEdits"
)
):
pass # Agent handles tool selection automatically
The simplest way to use Claude Code follows Anthropic's "Building Effective Agents" guidance13IndustryBuilding Effective AgentsView Guide: start simple, add complexity only as needed.
# Simple task - Claude reasons and acts autonomously
claude "Refactor the database connection to use connection pooling"
# With specific file context
claude "Review @src/auth/login.py for security issues"
# Multi-step task with planning
claude "Create a REST API endpoint for user registration with validation"
The get-shit-done-cc (GSD) workflow system demonstrates how academic research on agent architectures translates into practical implementation. GSD implements research patterns through structured agents, file-based state management, and explicit deviation handling protocols.
The following table maps research concepts discussed throughout this section to their concrete implementations in the GSD system. GSD implements these patterns through a skill-based architecture where each agent is defined in Markdown files with specialized prompts and tool permissions.
| Research Concept | Source | GSD Component | Implementation Pattern |
|---|---|---|---|
| ReAct reasoning-action loop | Yao et al. 20234 | gsd-executor |
Deviation rules as structured reasoning checkpoints |
| Multi-agent coordination | Wu et al. 20249 | execute-phase workflow |
Wave-based parallel execution with dependency ordering |
| Goal-directed verification | Huang et al. 202411 | gsd-verifier |
Goal-backward must-have checking (truths, artifacts, wiring) |
| Episodic + semantic memory | Park et al. 20238 | STATE.md + SUMMARY.md |
Accumulated decisions persist across sessions |
| Cognitive architecture layers | Sumers et al. 20247 | .claude/agents/*.md |
Agent definitions with specialized prompts and tool access |
| Bounded autonomy | Anthropic 202413 | Plan sizing rules | 2-3 tasks per plan, 50% context budget, checkpoint gates |
Beyond implementing existing research patterns, GSD introduces several novel contributions that advance the state of practice for AI agent workflows:
Structured categorization of when agents should auto-fix issues versus ask for permission. Four-rule system: Bug fixes (auto), Missing critical functionality (auto), Blocking issues (auto), Architectural changes (ask).
Verify observable truths, not task completion. Three-level artifact checking: EXISTS (file present), SUBSTANTIVE (not a stub), WIRED (connected to system).
Structured pause points with full state for continuation. Three types: human-verify (90%), decision (9%), human-action (1% rare). Fresh agent continuation, not session resume.
Dependency-aware parallel plan execution. Plans grouped by wave number; Wave N runs after Wave N-1 completes. Maximizes parallelism while respecting dependencies.
Cross-session decision memory via STATE.md. Project context persists: technology choices, architectural constraints, pending concerns. Prevents repeated discussions.
GSD's deviation rules implement bounded autonomy by categorizing unexpected situations into clear response patterns. This maps to the ReAct pattern's reasoning phase, but with explicit rules rather than implicit reasoning:
# RULE 1: Auto-fix bugs
Trigger: Code doesn't work as intended (errors, incorrect output)
Action: Fix immediately, track for Summary
Examples: Logic errors, type errors, security vulnerabilities
# RULE 2: Auto-add missing critical functionality
Trigger: Missing essential features for correctness/security
Action: Add immediately, track for Summary
Examples: Missing error handling, input validation, auth checks
# RULE 3: Auto-fix blocking issues
Trigger: Something prevents completing current task
Action: Fix immediately to unblock, track for Summary
Examples: Missing dependencies, broken imports, config errors
# RULE 4: Ask about architectural changes
Trigger: Fix requires significant structural modification
Action: STOP, return checkpoint with decision needed
Examples: New database table, switching frameworks, API contracts
Based on research patterns, several enhancements could extend GSD capabilities:
Research current as of: January 2026