Agentic Design Patterns: A Guide for AI Engineers

By

Komy A.

May 9, 2026

11 min read

Agentic Design Patterns: A Practical Guide for AI Engineers

Why Patterns Matter When Building with LLMs

Most teams start building AI agents the same way: they wire up an LLM, add a few tools, prompt it to "figure things out," and ship it. It works — sometimes. Then they hit a wall. The agent loops endlessly. It misses obvious steps. It calls the wrong tool three times in a row. The system that worked in demos falls apart under real conditions.

The missing piece is usually not the model. It's structure. Specifically, it's the absence of a deliberate agentic design pattern — a principled way of organizing how an agent perceives inputs, reasons, acts, and learns from outcomes.

These patterns have been formalized in recent work from Google, Microsoft, and academic researchers. Andrew Ng framed four of them in a widely-shared talk. Antonio Gulli's Agentic Design Patterns sparked enough search interest that developers are still hunting for a PDF of it. The interest is real. What's still scarce is a practical breakdown for engineers who need to choose between patterns, not just understand them theoretically.

This guide covers the five core agentic design patterns, when each one makes sense, and where each one breaks.

The Five Core Agentic Design Patterns

These are not mutually exclusive. Production systems almost always combine them. But understanding what each one does in isolation is how you make good decisions about which combination to reach for.

1. ReAct (Reason + Act)

ReAct is the default pattern for most agent implementations. The agent interleaves reasoning traces with tool calls: think about what to do, call a tool, observe the result, think again, call another tool if needed, and eventually produce a final answer.

The original ReAct paper from Google Research showed that combining reasoning and acting in this way significantly outperformed models that either just reasoned or just acted. The key insight is that the reasoning trace gives the model a scratchpad — a way to work through a problem step by step before committing to an action.

Most frameworks implement a version of this. LangChain's AgentExecutor, LlamaIndex's ReActAgent, and OpenAI's function-calling workflow all produce ReAct-style agent loops by default.

When it works well: Straightforward tasks with clear tool boundaries. Research tasks. Q&A with document retrieval. Any workflow where the next step is usually deterministic given the previous observation.

Where it breaks: Tasks that require long-horizon planning with many interdependent steps. ReAct agents tend to lose the plot after five or six tool calls. The reasoning trace grows long, earlier context gets diluted, and the agent starts making decisions based on what is closest in the context window rather than what the overall goal requires. You also cannot easily interrupt or redirect a running ReAct loop without re-entering from scratch.

2. Reflection

The reflection pattern adds a self-evaluation step to the agent loop. After generating an output, the agent critiques its own work — either using the same model or a separate critic model — then revises based on that critique.

This maps to how experienced engineers approach code review or document drafting. You write a first pass, step back and read it critically, then improve it. The reflection pattern operationalizes that instinct.

There are two main variants. Self-reflection uses a single model playing both writer and critic roles, typically via different system prompts. Reflexion (formalized in a 2023 paper from Shinn et al.) extends this by storing verbal feedback in an episodic memory buffer, allowing the agent to avoid repeating mistakes across multiple attempts.

When it works well: Code generation. Content writing. Any task where output quality matters more than speed, and where "correct" is evaluable by the model itself. If you have a clear rubric, reflection dramatically improves output quality over a single-pass approach.

Where it breaks: When the model cannot reliably identify its own errors. If an LLM confidently produces wrong output, the same LLM often confidently says the wrong output is fine. Reflection only works if the model's critic capacity is good enough to catch the mistakes its generator makes. For tasks requiring factual accuracy or domain-specific correctness, you often need an external verifier, not just a second prompt.

3. Tool Use

Tool use is often treated as a feature rather than a pattern, but it deserves its own category because the decisions you make around tool design shape the rest of your agent architecture.

An agent with access to tools can do things a pure LLM cannot: retrieve real-time data, write to databases, call external APIs, execute code. The OpenAI function calling spec and Anthropic's tool use API have made this much more reliable than early approaches that relied on LLMs generating function call strings in raw text.

The pattern-level decisions are how many tools the agent has access to at once — more tools means more capability but also more ambiguity. An agent with 50 tools will select the wrong one more often than one with 5 well-scoped tools. Whether tools are deterministic or involve side effects matters too. Read-only tools are safe to retry. Write tools are not. And you need explicit retry limits and fallback logic for tool errors, not just silent retries.

When it works well: Almost every production agent needs tool use. The pattern is foundational. The question is not whether to use it but how to scope your tool set. Narrower tool sets with clear schemas consistently outperform broader ones.

Where it breaks: When tools have overlapping semantics and the agent cannot distinguish between them. When tool schemas are underspecified and the model fills in the blanks incorrectly. When tools have side effects and the agent does not understand the cost of calling them repeatedly.

4. Planning

Planning patterns explicitly separate the phase where the agent figures out what to do from the phase where it does it. Instead of deciding action-by-action during execution as in ReAct, the agent first produces a full plan — a sequence of steps — and then executes against that plan.

There are several sub-patterns here. Plan-and-Execute generates an upfront plan, then uses sub-agents to execute each step. LATS (Language Agent Tree Search) treats planning as a tree search problem, exploring multiple branches and backtracking when branches fail. HuggingGPT routes tasks to specialized models based on a task decomposition step.

The argument for planning is clear: for complex multi-step tasks, knowing where you are going before you start walking is better than figuring it out step by step. A plan gives you a checkpoint structure. It also makes it much easier to parallelize work — if you know steps 2 and 3 do not depend on each other, you can run them simultaneously.

When it works well: Long-horizon tasks with many interdependent steps. Tasks where you can validate the plan before executing. Research pipelines, code generation workflows, multi-document analysis.

Where it breaks: When the environment is dynamic and the plan goes stale quickly. If step 3 reveals information that makes step 4 wrong, a rigid plan-and-execute architecture has no clean way to revise. You end up needing a replanning loop, which brings back most of the complexity you were trying to avoid. Also: planning with LLMs produces plans that look reasonable but contain subtle logical errors. Executing a bad plan is worse than adapting during execution.

5. Multi-Agent

The multi-agent pattern routes work to specialized agents rather than trying to solve everything with a single general-purpose agent. A coordinator breaks down a task and delegates to worker agents, each with a specific role, tool set, or domain expertise.

This mirrors how human teams work. You do not have one person handle legal review, technical writing, and financial modeling simultaneously. You use specialists.

Multi-agent systems introduce genuine engineering complexity: communication protocols between agents, shared state management, handling partial failures when one agent in a pipeline errors out. Frameworks like AutoGen and LangGraph have built primitives to address this, but the coordination overhead is real.

When it works well: Tasks that genuinely decompose into distinct specializations. Large codebases where different agents can own different subsystems. Long-running workflows where agent specialization improves reliability. Any situation where a single-agent context window would overflow given the total information required.

Where it breaks: When teams reach for multi-agent because a single-agent approach is failing, without diagnosing why. Multi-agent adds coordination complexity on top of the existing problem. If your single-agent setup is failing because the model is unreliable, a multi-agent setup will give you multiple points of unreliability. The pattern shines when the failure mode is scope, not quality.

How These Patterns Combine in Practice

Most production systems are not simply "a ReAct agent" or "a planning agent." They combine patterns at different layers.

A common architecture: an outer planning loop figures out the high-level approach, delegates to one or more ReAct-style worker agents for execution, and uses reflection at the output stage to validate results before returning them to the user. Tool use is woven throughout — workers call tools, and the coordinator might use tools to fetch context before planning.

The useful mental model is to think about where in your system failure is most likely and which pattern addresses that failure mode:

Output quality is inconsistent: add reflection
Agent loses track of the goal over long executions: add planning
Single agent is doing too many things: decompose with multi-agent
Execution is slow: look at which steps can be parallelized, a planning or multi-agent concern
Tools are being called incorrectly: fix tool schemas before changing your architecture

The sequence matters. Before reaching for architectural complexity, audit your tool schemas and your prompts. The majority of agentic failures in production are prompt and tool problems, not architecture problems. Adding a reflection loop to a system with a broken tool schema does not help.

Choosing a Starting Point

For most teams building their first production agent, the right starting point is ReAct with a small, well-scoped tool set. Get that working reliably. Then add reflection if output quality is the main issue. Add planning if long-horizon reasoning is failing. Consider multi-agent only when you have a clear specialization boundary that a single agent genuinely cannot bridge.

The temptation is to design the full architecture upfront — orchestrator, sub-agents, reflection loops, planning module, the whole thing. That almost always backfires. Each layer of complexity makes debugging harder and failure modes less obvious. Build incrementally and let observed failures drive the architecture, not theoretical elegance.

If you are working with clients or stakeholders who want a "sophisticated" system, the sophistication should show in reliability and capability, not in the number of architectural components. An agent that does three things well is more valuable than one that does twelve things erratically.

For teams moving from prototype to production — where these patterns start to matter at scale — Genta builds production-grade agentic systems working embedded in engineering teams, starting with the simplest architecture that could plausibly work and layering in complexity as real usage reveals where the gaps are.

Where the Field Is Heading

A few directions worth watching as these patterns evolve.

Better planning with structured reasoning. Models like o3 and Claude's extended thinking show that giving models more compute at inference time improves multi-step reasoning quality. This makes planning patterns more reliable — the plan is less likely to contain logical errors if the model can think more carefully before committing.

Memory as a first-class pattern. The five patterns above all assume relatively stateless agents. The emerging sixth pattern is persistent memory — agents that accumulate experience across sessions and adapt based on what has and has not worked. This is still early, but projects in this space point clearly at where the field goes next.

Standardized communication protocols. The Google A2A protocol and MCP are both attempts to standardize how agents communicate with tools and with each other. As these become common infrastructure, the multi-agent pattern gets easier to implement reliably — the coordination overhead shrinks when everyone speaks the same protocol.

The patterns themselves are not the destination. They are a shared vocabulary for making decisions about agent architecture — decisions that currently get made implicitly and inconsistently, often by whoever writes the first version of the agent. Making those decisions explicit, early, and for clear reasons, is most of what separates teams that ship reliable agentic systems from teams that ship impressive demos.

View all

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

Let's Connect

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

Let's Connect

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

Let's Connect

By

Komy A.

May 9, 2026

11 min read

Agentic Design Patterns: A Practical Guide for AI Engineers

Why Patterns Matter When Building with LLMs

Most teams start building AI agents the same way: they wire up an LLM, add a few tools, prompt it to "figure things out," and ship it. It works — sometimes. Then they hit a wall. The agent loops endlessly. It misses obvious steps. It calls the wrong tool three times in a row. The system that worked in demos falls apart under real conditions.

The missing piece is usually not the model. It's structure. Specifically, it's the absence of a deliberate agentic design pattern — a principled way of organizing how an agent perceives inputs, reasons, acts, and learns from outcomes.

These patterns have been formalized in recent work from Google, Microsoft, and academic researchers. Andrew Ng framed four of them in a widely-shared talk. Antonio Gulli's Agentic Design Patterns sparked enough search interest that developers are still hunting for a PDF of it. The interest is real. What's still scarce is a practical breakdown for engineers who need to choose between patterns, not just understand them theoretically.

This guide covers the five core agentic design patterns, when each one makes sense, and where each one breaks.

The Five Core Agentic Design Patterns

These are not mutually exclusive. Production systems almost always combine them. But understanding what each one does in isolation is how you make good decisions about which combination to reach for.