April 29, 2026

10 min read

AI Agent Orchestration: LLM-Driven vs Code-Driven Patterns (And How to Choose)

The Decision Nobody Talks About

Most writing about AI agent orchestration covers the same ground: what an orchestrator agent is, why multi-agent systems are useful, which platforms exist. That's fine as background. But if you're actually building one of these systems, the question that matters most is almost never addressed directly.

Do you let the LLM decide what to do next, or do you write that logic yourself?

This is the LLM-driven vs code-driven orchestration question. It sits at the center of almost every architectural decision in a real agentic system. Get it wrong and you end up with either a brittle pipeline that breaks on edge cases, or an unpredictable agent that does surprising things in production. Getting it right means understanding the actual trade-offs, not just the marketing pitch of whichever framework you started with.

What Orchestration Actually Means in Practice

Before the distinction makes sense, it helps to be precise about what orchestration covers. In a multi-agent system, orchestration is the logic that answers: which agent runs, when, with what input, and what happens with its output? It includes routing decisions, handling failures, managing state between steps, and deciding when a task is complete.

The OpenAI Agents SDK documentation frames this well: "Orchestration refers to the flow of agents in your app. Which agents run, in what order, and how do they decide what happens next?" That last clause is where the LLM-driven vs code-driven split lives.

There are three patterns in common use, and they exist on a spectrum rather than as clean categories:

  • Sequential: Agents run in a fixed order, each passing its output to the next.

  • Parallel: Multiple agents run simultaneously on the same input, results are aggregated.

  • Hierarchical (dynamic): An orchestrator agent breaks a task into subtasks and delegates to specialists, deciding at runtime what to do based on context.

The first two are almost always code-driven. The third is where the LLM-driven vs code-driven question becomes real.

LLM-Driven Orchestration

In a fully LLM-driven approach, the orchestrating agent itself decides the sequence of actions. You give it a goal and a set of available tools or sub-agents, and the model figures out what to call and in what order. This is what most people imagine when they hear "autonomous AI agent."

The appeal is obvious. You don't have to anticipate every path through a workflow. The system can handle novel inputs gracefully, because it's reasoning about the goal rather than pattern-matching against predefined steps. A well-designed LLM orchestrator can decompose complex tasks, recover from intermediate failures, and adapt its approach when a sub-agent returns unexpected results.

This is broadly the approach taken by frameworks like LangGraph when using ReAct-style execution, and it's what makes Claude with tool use feel genuinely agentic rather than just automated.

The problems show up in production. LLM-driven orchestration is expensive, slow, and non-deterministic. Every orchestration decision requires an inference call. Latency compounds across multiple steps. More importantly, the same input can produce different execution paths on different runs. For anything that needs to be auditable, compliant, or predictable, this is a serious problem. You also lose the ability to reason about worst-case behavior, because the search space of possible execution paths is effectively unbounded.

There's also a reliability concern that's easy to underestimate. When the orchestrator is an LLM, prompt degradation affects control flow, not just output quality. A model that starts misinterpreting its available tools or forgetting constraints mid-task doesn't just produce bad text. It routes work incorrectly, skips steps, or calls the wrong sub-agents entirely.

Code-Driven Orchestration

Code-driven orchestration means you define the workflow logic explicitly in code. The LLM is used for tasks that genuinely require language understanding or generation, but the routing, sequencing, and control flow are deterministic. You know exactly what will happen given a particular state.

Microsoft's Azure Architecture documentation on agent design patterns describes the more predictable approach as the recommended starting point, before introducing dynamic LLM-driven routing.

The advantages are the mirror of LLM-driven's weaknesses. Cost per workflow run is lower because you're not burning inference tokens on routing decisions. Execution is faster. The system is testable in the normal engineering sense: you can write unit tests, integration tests, and end-to-end tests against a fixed state machine. Debugging is straightforward because there's no stochastic element in the control flow.

The limitation is brittleness on the edges. Code-driven orchestration handles the cases you designed for. When a real-world input falls outside the anticipated paths, the system either fails or falls back to a generic handler. It doesn't adapt. This matters more in some domains than others. A document processing pipeline that runs the same set of steps every time is a perfect candidate for code-driven orchestration. A research assistant that needs to dynamically decide whether to search the web, query a database, or ask a clarifying question is not.

The Real Decision Framework

The question isn't which approach is better in the abstract. It's which to use where, in the same system.

Most production agentic systems worth building use both. The outer structure, the top-level routing and workflow management, is often code-driven because it maps to known business processes with predictable inputs and outputs. The inner execution of specific tasks, where the LLM's reasoning actually adds value, is LLM-driven at a localized scope.

Think of it this way: code orchestrates the pipeline, LLMs execute the steps within it.

A few signals point clearly toward code-driven orchestration for a given decision point:

  • The possible states are enumerable and you can name them all

  • The workflow must be auditable or compliant (financial services, healthcare, legal)

  • The same input should always produce the same execution path

  • Cost and latency budgets are tight

  • The team needs to debug and trace failures systematically

LLM-driven orchestration makes sense when:

  • The task genuinely requires reasoning about context to decide what to do next

  • Inputs are highly variable and you can't enumerate the paths in advance

  • The cost of building and maintaining explicit routing logic exceeds the cost of inference

  • Graceful degradation on unexpected inputs matters more than perfect predictability

One practical heuristic that holds up in real systems: if you can write the routing logic as a decision tree and it fits on one page, write it in code. If you can't, and the branching is fundamentally semantic rather than structural, that's a candidate for LLM-driven routing.

Where Most Teams Go Wrong

The most common mistake isn't choosing the wrong approach. It's applying one approach universally across an entire system without thinking about which parts actually benefit from LLM reasoning.

Teams new to agentic systems often start with fully LLM-driven orchestration because it's easier to prototype. You describe what you want in a system prompt, list the available tools, and let the model figure it out. This works surprisingly well in demos. Then it goes to production and the failure modes compound: inconsistent behavior across similar inputs, runaway tool calls, context window pressure causing the orchestrator to lose track of constraints, costs that scale badly with usage.

The correction is usually to harden the outer structure with code while preserving LLM reasoning for the steps that genuinely need it. This is messier to refactor than to design correctly from the start, which is an argument for thinking about the boundary early in the build.

The other common mistake is treating orchestration as a framework selection problem rather than an architectural one. Picking LangGraph vs CrewAI vs the OpenAI Agents SDK matters less than deciding where the LLM-driven vs code-driven boundary should sit. Most of these frameworks support both approaches. The question is how you use them.

Hybrid Patterns That Actually Work

The architectures that hold up in production are genuinely hybrid, but not in a vague sense. They're hybrid in a specific way: deterministic orchestration at the workflow level, with LLM-driven execution at the task level, and narrow LLM-driven routing only at decision points where the semantic complexity justifies it.

Plan-then-execute

The LLM generates a plan, typically as a structured list of steps, before any execution begins. A code-driven executor then runs those steps sequentially with minimal LLM involvement in routing. This gives you the flexibility of LLM planning with the predictability of deterministic execution. It also makes the system much easier to test: you can test the planner separately from the executor, and you can inject plans directly to test specific execution paths without going through the full planning process.

Classifier front-end

Rather than having a general orchestrator reason about every input, you use a fast, cheap classification step to route inputs to specialized sub-pipelines, each of which is fully code-driven. The classifier adds one LLM call per request but dramatically reduces the complexity of downstream orchestration. This is particularly useful in customer-facing applications where input variety is high but the set of genuinely useful responses is bounded. The Anthropic guide on building effective agents describes this as a "prompt chaining with routing" pattern and notes it's one of the most reliable approaches for production systems.

Hierarchical with guarded handoffs

The orchestrator agent can delegate to sub-agents, but handoffs happen through typed interfaces defined in code rather than free-form LLM output. The orchestrator decides what to delegate (LLM-driven), but the structure of how work is passed and how results come back is fully defined (code-driven). This preserves the flexibility of dynamic delegation while giving you a stable contract between components that you can test and monitor.

Observability Is Not Optional

Regardless of which approach you choose, you need visibility into what's actually happening at runtime. This is more important in agentic systems than in traditional software because the failure modes are subtler. A workflow can technically complete, return a plausible-looking result, and still have taken the wrong path through a multi-agent system.

What to instrument: which agents ran, in what order, what inputs they received, what they returned, how long each step took, and where in the orchestration flow any errors occurred. For LLM-driven components specifically, log the full prompt and completion at each orchestration decision point so you can reconstruct what the model was reasoning about when something went wrong.

Without this, debugging production failures in any multi-agent system is extremely difficult. The observability and reliability engineering considerations covered in Genta's guide to AI observability for agentic systems go deeper on the instrumentation side of this.

Starting Points That Don't Set You Up to Fail

If you're starting a new agentic system: begin with code-driven orchestration and introduce LLM-driven routing only when you hit a specific problem that code can't handle elegantly. This is the principle in Microsoft's agent design pattern guidance, and it's right. The systems that get rebuilt from scratch after six months in production are almost always the ones that started with maximum autonomy and worked backward from there.

Define your LLM-driven vs code-driven boundary explicitly, document it, and treat it as a first-class architectural decision. When that boundary starts to blur as the system evolves, that's a signal to revisit it, not to let it drift.

The framework you pick, LangGraph, CrewAI, the OpenAI Agents SDK, or something custom, is genuinely secondary. Once you know where your boundary sits and what you need to observe, most frameworks are capable enough. The differences are in ergonomics and ecosystem fit, not in whether they can implement the architecture you actually need.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

April 29, 2026

10 min read

AI Agent Orchestration: LLM-Driven vs Code-Driven Patterns (And How to Choose)

The Decision Nobody Talks About

Most writing about AI agent orchestration covers the same ground: what an orchestrator agent is, why multi-agent systems are useful, which platforms exist. That's fine as background. But if you're actually building one of these systems, the question that matters most is almost never addressed directly.

Do you let the LLM decide what to do next, or do you write that logic yourself?

This is the LLM-driven vs code-driven orchestration question. It sits at the center of almost every architectural decision in a real agentic system. Get it wrong and you end up with either a brittle pipeline that breaks on edge cases, or an unpredictable agent that does surprising things in production. Getting it right means understanding the actual trade-offs, not just the marketing pitch of whichever framework you started with.

What Orchestration Actually Means in Practice

Before the distinction makes sense, it helps to be precise about what orchestration covers. In a multi-agent system, orchestration is the logic that answers: which agent runs, when, with what input, and what happens with its output? It includes routing decisions, handling failures, managing state between steps, and deciding when a task is complete.

The OpenAI Agents SDK documentation frames this well: "Orchestration refers to the flow of agents in your app. Which agents run, in what order, and how do they decide what happens next?" That last clause is where the LLM-driven vs code-driven split lives.

There are three patterns in common use, and they exist on a spectrum rather than as clean categories:

  • Sequential: Agents run in a fixed order, each passing its output to the next.

  • Parallel: Multiple agents run simultaneously on the same input, results are aggregated.

  • Hierarchical (dynamic): An orchestrator agent breaks a task into subtasks and delegates to specialists, deciding at runtime what to do based on context.

The first two are almost always code-driven. The third is where the LLM-driven vs code-driven question becomes real.

LLM-Driven Orchestration

In a fully LLM-driven approach, the orchestrating agent itself decides the sequence of actions. You give it a goal and a set of available tools or sub-agents, and the model figures out what to call and in what order. This is what most people imagine when they hear "autonomous AI agent."

The appeal is obvious. You don't have to anticipate every path through a workflow. The system can handle novel inputs gracefully, because it's reasoning about the goal rather than pattern-matching against predefined steps. A well-designed LLM orchestrator can decompose complex tasks, recover from intermediate failures, and adapt its approach when a sub-agent returns unexpected results.

This is broadly the approach taken by frameworks like LangGraph when using ReAct-style execution, and it's what makes Claude with tool use feel genuinely agentic rather than just automated.

The problems show up in production. LLM-driven orchestration is expensive, slow, and non-deterministic. Every orchestration decision requires an inference call. Latency compounds across multiple steps. More importantly, the same input can produce different execution paths on different runs. For anything that needs to be auditable, compliant, or predictable, this is a serious problem. You also lose the ability to reason about worst-case behavior, because the search space of possible execution paths is effectively unbounded.

There's also a reliability concern that's easy to underestimate. When the orchestrator is an LLM, prompt degradation affects control flow, not just output quality. A model that starts misinterpreting its available tools or forgetting constraints mid-task doesn't just produce bad text. It routes work incorrectly, skips steps, or calls the wrong sub-agents entirely.

Code-Driven Orchestration

Code-driven orchestration means you define the workflow logic explicitly in code. The LLM is used for tasks that genuinely require language understanding or generation, but the routing, sequencing, and control flow are deterministic. You know exactly what will happen given a particular state.

Microsoft's Azure Architecture documentation on agent design patterns describes the more predictable approach as the recommended starting point, before introducing dynamic LLM-driven routing.

The advantages are the mirror of LLM-driven's weaknesses. Cost per workflow run is lower because you're not burning inference tokens on routing decisions. Execution is faster. The system is testable in the normal engineering sense: you can write unit tests, integration tests, and end-to-end tests against a fixed state machine. Debugging is straightforward because there's no stochastic element in the control flow.

The limitation is brittleness on the edges. Code-driven orchestration handles the cases you designed for. When a real-world input falls outside the anticipated paths, the system either fails or falls back to a generic handler. It doesn't adapt. This matters more in some domains than others. A document processing pipeline that runs the same set of steps every time is a perfect candidate for code-driven orchestration. A research assistant that needs to dynamically decide whether to search the web, query a database, or ask a clarifying question is not.

The Real Decision Framework

The question isn't which approach is better in the abstract. It's which to use where, in the same system.

Most production agentic systems worth building use both. The outer structure, the top-level routing and workflow management, is often code-driven because it maps to known business processes with predictable inputs and outputs. The inner execution of specific tasks, where the LLM's reasoning actually adds value, is LLM-driven at a localized scope.

Think of it this way: code orchestrates the pipeline, LLMs execute the steps within it.

A few signals point clearly toward code-driven orchestration for a given decision point:

  • The possible states are enumerable and you can name them all

  • The workflow must be auditable or compliant (financial services, healthcare, legal)

  • The same input should always produce the same execution path

  • Cost and latency budgets are tight

  • The team needs to debug and trace failures systematically

LLM-driven orchestration makes sense when:

  • The task genuinely requires reasoning about context to decide what to do next

  • Inputs are highly variable and you can't enumerate the paths in advance

  • The cost of building and maintaining explicit routing logic exceeds the cost of inference

  • Graceful degradation on unexpected inputs matters more than perfect predictability

One practical heuristic that holds up in real systems: if you can write the routing logic as a decision tree and it fits on one page, write it in code. If you can't, and the branching is fundamentally semantic rather than structural, that's a candidate for LLM-driven routing.

Where Most Teams Go Wrong

The most common mistake isn't choosing the wrong approach. It's applying one approach universally across an entire system without thinking about which parts actually benefit from LLM reasoning.

Teams new to agentic systems often start with fully LLM-driven orchestration because it's easier to prototype. You describe what you want in a system prompt, list the available tools, and let the model figure it out. This works surprisingly well in demos. Then it goes to production and the failure modes compound: inconsistent behavior across similar inputs, runaway tool calls, context window pressure causing the orchestrator to lose track of constraints, costs that scale badly with usage.

The correction is usually to harden the outer structure with code while preserving LLM reasoning for the steps that genuinely need it. This is messier to refactor than to design correctly from the start, which is an argument for thinking about the boundary early in the build.

The other common mistake is treating orchestration as a framework selection problem rather than an architectural one. Picking LangGraph vs CrewAI vs the OpenAI Agents SDK matters less than deciding where the LLM-driven vs code-driven boundary should sit. Most of these frameworks support both approaches. The question is how you use them.

Hybrid Patterns That Actually Work

The architectures that hold up in production are genuinely hybrid, but not in a vague sense. They're hybrid in a specific way: deterministic orchestration at the workflow level, with LLM-driven execution at the task level, and narrow LLM-driven routing only at decision points where the semantic complexity justifies it.

Plan-then-execute

The LLM generates a plan, typically as a structured list of steps, before any execution begins. A code-driven executor then runs those steps sequentially with minimal LLM involvement in routing. This gives you the flexibility of LLM planning with the predictability of deterministic execution. It also makes the system much easier to test: you can test the planner separately from the executor, and you can inject plans directly to test specific execution paths without going through the full planning process.

Classifier front-end

Rather than having a general orchestrator reason about every input, you use a fast, cheap classification step to route inputs to specialized sub-pipelines, each of which is fully code-driven. The classifier adds one LLM call per request but dramatically reduces the complexity of downstream orchestration. This is particularly useful in customer-facing applications where input variety is high but the set of genuinely useful responses is bounded. The Anthropic guide on building effective agents describes this as a "prompt chaining with routing" pattern and notes it's one of the most reliable approaches for production systems.

Hierarchical with guarded handoffs

The orchestrator agent can delegate to sub-agents, but handoffs happen through typed interfaces defined in code rather than free-form LLM output. The orchestrator decides what to delegate (LLM-driven), but the structure of how work is passed and how results come back is fully defined (code-driven). This preserves the flexibility of dynamic delegation while giving you a stable contract between components that you can test and monitor.

Observability Is Not Optional

Regardless of which approach you choose, you need visibility into what's actually happening at runtime. This is more important in agentic systems than in traditional software because the failure modes are subtler. A workflow can technically complete, return a plausible-looking result, and still have taken the wrong path through a multi-agent system.

What to instrument: which agents ran, in what order, what inputs they received, what they returned, how long each step took, and where in the orchestration flow any errors occurred. For LLM-driven components specifically, log the full prompt and completion at each orchestration decision point so you can reconstruct what the model was reasoning about when something went wrong.

Without this, debugging production failures in any multi-agent system is extremely difficult. The observability and reliability engineering considerations covered in Genta's guide to AI observability for agentic systems go deeper on the instrumentation side of this.

Starting Points That Don't Set You Up to Fail

If you're starting a new agentic system: begin with code-driven orchestration and introduce LLM-driven routing only when you hit a specific problem that code can't handle elegantly. This is the principle in Microsoft's agent design pattern guidance, and it's right. The systems that get rebuilt from scratch after six months in production are almost always the ones that started with maximum autonomy and worked backward from there.

Define your LLM-driven vs code-driven boundary explicitly, document it, and treat it as a first-class architectural decision. When that boundary starts to blur as the system evolves, that's a signal to revisit it, not to let it drift.

The framework you pick, LangGraph, CrewAI, the OpenAI Agents SDK, or something custom, is genuinely secondary. Once you know where your boundary sits and what you need to observe, most frameworks are capable enough. The differences are in ergonomics and ecosystem fit, not in whether they can implement the architecture you actually need.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

April 29, 2026

10 min read

AI Agent Orchestration: LLM-Driven vs Code-Driven Patterns (And How to Choose)

The Decision Nobody Talks About

Most writing about AI agent orchestration covers the same ground: what an orchestrator agent is, why multi-agent systems are useful, which platforms exist. That's fine as background. But if you're actually building one of these systems, the question that matters most is almost never addressed directly.

Do you let the LLM decide what to do next, or do you write that logic yourself?

This is the LLM-driven vs code-driven orchestration question. It sits at the center of almost every architectural decision in a real agentic system. Get it wrong and you end up with either a brittle pipeline that breaks on edge cases, or an unpredictable agent that does surprising things in production. Getting it right means understanding the actual trade-offs, not just the marketing pitch of whichever framework you started with.

What Orchestration Actually Means in Practice

Before the distinction makes sense, it helps to be precise about what orchestration covers. In a multi-agent system, orchestration is the logic that answers: which agent runs, when, with what input, and what happens with its output? It includes routing decisions, handling failures, managing state between steps, and deciding when a task is complete.

The OpenAI Agents SDK documentation frames this well: "Orchestration refers to the flow of agents in your app. Which agents run, in what order, and how do they decide what happens next?" That last clause is where the LLM-driven vs code-driven split lives.

There are three patterns in common use, and they exist on a spectrum rather than as clean categories:

  • Sequential: Agents run in a fixed order, each passing its output to the next.

  • Parallel: Multiple agents run simultaneously on the same input, results are aggregated.

  • Hierarchical (dynamic): An orchestrator agent breaks a task into subtasks and delegates to specialists, deciding at runtime what to do based on context.

The first two are almost always code-driven. The third is where the LLM-driven vs code-driven question becomes real.

LLM-Driven Orchestration

In a fully LLM-driven approach, the orchestrating agent itself decides the sequence of actions. You give it a goal and a set of available tools or sub-agents, and the model figures out what to call and in what order. This is what most people imagine when they hear "autonomous AI agent."

The appeal is obvious. You don't have to anticipate every path through a workflow. The system can handle novel inputs gracefully, because it's reasoning about the goal rather than pattern-matching against predefined steps. A well-designed LLM orchestrator can decompose complex tasks, recover from intermediate failures, and adapt its approach when a sub-agent returns unexpected results.

This is broadly the approach taken by frameworks like LangGraph when using ReAct-style execution, and it's what makes Claude with tool use feel genuinely agentic rather than just automated.

The problems show up in production. LLM-driven orchestration is expensive, slow, and non-deterministic. Every orchestration decision requires an inference call. Latency compounds across multiple steps. More importantly, the same input can produce different execution paths on different runs. For anything that needs to be auditable, compliant, or predictable, this is a serious problem. You also lose the ability to reason about worst-case behavior, because the search space of possible execution paths is effectively unbounded.

There's also a reliability concern that's easy to underestimate. When the orchestrator is an LLM, prompt degradation affects control flow, not just output quality. A model that starts misinterpreting its available tools or forgetting constraints mid-task doesn't just produce bad text. It routes work incorrectly, skips steps, or calls the wrong sub-agents entirely.

Code-Driven Orchestration

Code-driven orchestration means you define the workflow logic explicitly in code. The LLM is used for tasks that genuinely require language understanding or generation, but the routing, sequencing, and control flow are deterministic. You know exactly what will happen given a particular state.

Microsoft's Azure Architecture documentation on agent design patterns describes the more predictable approach as the recommended starting point, before introducing dynamic LLM-driven routing.

The advantages are the mirror of LLM-driven's weaknesses. Cost per workflow run is lower because you're not burning inference tokens on routing decisions. Execution is faster. The system is testable in the normal engineering sense: you can write unit tests, integration tests, and end-to-end tests against a fixed state machine. Debugging is straightforward because there's no stochastic element in the control flow.

The limitation is brittleness on the edges. Code-driven orchestration handles the cases you designed for. When a real-world input falls outside the anticipated paths, the system either fails or falls back to a generic handler. It doesn't adapt. This matters more in some domains than others. A document processing pipeline that runs the same set of steps every time is a perfect candidate for code-driven orchestration. A research assistant that needs to dynamically decide whether to search the web, query a database, or ask a clarifying question is not.

The Real Decision Framework

The question isn't which approach is better in the abstract. It's which to use where, in the same system.

Most production agentic systems worth building use both. The outer structure, the top-level routing and workflow management, is often code-driven because it maps to known business processes with predictable inputs and outputs. The inner execution of specific tasks, where the LLM's reasoning actually adds value, is LLM-driven at a localized scope.

Think of it this way: code orchestrates the pipeline, LLMs execute the steps within it.

A few signals point clearly toward code-driven orchestration for a given decision point:

  • The possible states are enumerable and you can name them all

  • The workflow must be auditable or compliant (financial services, healthcare, legal)

  • The same input should always produce the same execution path

  • Cost and latency budgets are tight

  • The team needs to debug and trace failures systematically

LLM-driven orchestration makes sense when:

  • The task genuinely requires reasoning about context to decide what to do next

  • Inputs are highly variable and you can't enumerate the paths in advance

  • The cost of building and maintaining explicit routing logic exceeds the cost of inference

  • Graceful degradation on unexpected inputs matters more than perfect predictability

One practical heuristic that holds up in real systems: if you can write the routing logic as a decision tree and it fits on one page, write it in code. If you can't, and the branching is fundamentally semantic rather than structural, that's a candidate for LLM-driven routing.

Where Most Teams Go Wrong

The most common mistake isn't choosing the wrong approach. It's applying one approach universally across an entire system without thinking about which parts actually benefit from LLM reasoning.

Teams new to agentic systems often start with fully LLM-driven orchestration because it's easier to prototype. You describe what you want in a system prompt, list the available tools, and let the model figure it out. This works surprisingly well in demos. Then it goes to production and the failure modes compound: inconsistent behavior across similar inputs, runaway tool calls, context window pressure causing the orchestrator to lose track of constraints, costs that scale badly with usage.

The correction is usually to harden the outer structure with code while preserving LLM reasoning for the steps that genuinely need it. This is messier to refactor than to design correctly from the start, which is an argument for thinking about the boundary early in the build.

The other common mistake is treating orchestration as a framework selection problem rather than an architectural one. Picking LangGraph vs CrewAI vs the OpenAI Agents SDK matters less than deciding where the LLM-driven vs code-driven boundary should sit. Most of these frameworks support both approaches. The question is how you use them.

Hybrid Patterns That Actually Work

The architectures that hold up in production are genuinely hybrid, but not in a vague sense. They're hybrid in a specific way: deterministic orchestration at the workflow level, with LLM-driven execution at the task level, and narrow LLM-driven routing only at decision points where the semantic complexity justifies it.

Plan-then-execute

The LLM generates a plan, typically as a structured list of steps, before any execution begins. A code-driven executor then runs those steps sequentially with minimal LLM involvement in routing. This gives you the flexibility of LLM planning with the predictability of deterministic execution. It also makes the system much easier to test: you can test the planner separately from the executor, and you can inject plans directly to test specific execution paths without going through the full planning process.

Classifier front-end

Rather than having a general orchestrator reason about every input, you use a fast, cheap classification step to route inputs to specialized sub-pipelines, each of which is fully code-driven. The classifier adds one LLM call per request but dramatically reduces the complexity of downstream orchestration. This is particularly useful in customer-facing applications where input variety is high but the set of genuinely useful responses is bounded. The Anthropic guide on building effective agents describes this as a "prompt chaining with routing" pattern and notes it's one of the most reliable approaches for production systems.

Hierarchical with guarded handoffs

The orchestrator agent can delegate to sub-agents, but handoffs happen through typed interfaces defined in code rather than free-form LLM output. The orchestrator decides what to delegate (LLM-driven), but the structure of how work is passed and how results come back is fully defined (code-driven). This preserves the flexibility of dynamic delegation while giving you a stable contract between components that you can test and monitor.

Observability Is Not Optional

Regardless of which approach you choose, you need visibility into what's actually happening at runtime. This is more important in agentic systems than in traditional software because the failure modes are subtler. A workflow can technically complete, return a plausible-looking result, and still have taken the wrong path through a multi-agent system.

What to instrument: which agents ran, in what order, what inputs they received, what they returned, how long each step took, and where in the orchestration flow any errors occurred. For LLM-driven components specifically, log the full prompt and completion at each orchestration decision point so you can reconstruct what the model was reasoning about when something went wrong.

Without this, debugging production failures in any multi-agent system is extremely difficult. The observability and reliability engineering considerations covered in Genta's guide to AI observability for agentic systems go deeper on the instrumentation side of this.

Starting Points That Don't Set You Up to Fail

If you're starting a new agentic system: begin with code-driven orchestration and introduce LLM-driven routing only when you hit a specific problem that code can't handle elegantly. This is the principle in Microsoft's agent design pattern guidance, and it's right. The systems that get rebuilt from scratch after six months in production are almost always the ones that started with maximum autonomy and worked backward from there.

Define your LLM-driven vs code-driven boundary explicitly, document it, and treat it as a first-class architectural decision. When that boundary starts to blur as the system evolves, that's a signal to revisit it, not to let it drift.

The framework you pick, LangGraph, CrewAI, the OpenAI Agents SDK, or something custom, is genuinely secondary. Once you know where your boundary sits and what you need to observe, most frameworks are capable enough. The differences are in ergonomics and ecosystem fit, not in whether they can implement the architecture you actually need.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.

We’re Here to Help

Ready to transform your operations? We're here to help. Contact us today to learn more about our innovative solutions and expert services.