Agent Planning: From Goals to Executable Actions in Production AI Agents
What Is Agent Planning​
Agent Planning is the process of transforming a high‑level, ambiguous user goal into a concrete, ordered sequence of executable actions (tool calls, API requests, or sub‑tasks). It is the component that gives an agent its autonomy—allowing it to break down “Research Q3 sales trends” into query_database → analyze_numbers → generate_chart → write_summary.
Unlike a traditional program with hardcoded steps, an agent’s planner is driven by an LLM and can adapt when new information arrives or steps fail. Planning turns a reactive chatbot into a proactive, goal‑oriented system.
Examples across domains:
| Domain | User Goal | Plan (actions) |
|---|---|---|
| Research | “Compare three cloud providers’ pricing for GPU instances” | search_aws_pricing → search_azure_pricing → search_gcp_pricing → compare_in_table → summarise |
| Coding | “Refactor auth.py to use async/await” | read_file → identify_sync_functions → generate_async_versions → write_backup → replace_file → run_tests |
| Customer Support | “Refund shipping on order #ORD‑1234 because it’s late” | get_order_status → if delayed then request_refund → send_email_confirmation |
| Business Automation | “Onboard new employee: create Slack, GitHub, email” | create_slack_account → create_github_account → create_email_alias → send_welcome_message |
Why Planning Matters​
Planning is one of the defining characteristics that separates AI agents from simpler LLM applications.
| Capability | Traditional Chatbot | RAG Application | AI Agent (with planning) |
|---|---|---|---|
| Handles multi‑step tasks | No (single Q&A) | No (one retrieval + answer) | Yes – decomposes goal into steps |
| Uses tools sequentially | No (or one fixed tool) | No (only retrieval) | Yes – orchestrates multiple tools |
| Adapts to intermediate results | No | No | Yes – replans based on tool outputs |
| Recovers from failures | No (just error message) | No | Yes – alternative tools or new plan |
| Visible to user | Single answer | Single answer | Can show plan and progress |
Without planning, an agent is just a fancy router: it can call one tool, get a result, and answer. With planning, it can handle tasks like “Find me the cheapest flight from NYC to London on June 15th, book it if under $500, and add to my calendar.” That requires sequencing: search flights → check price → conditionally call booking → call calendar API.
Planning in the Agent Architecture​
Planning is a distinct component in the agent runtime, sandwiched between memory retrieval and tool execution.
The planner does not execute actions. It produces a plan – a structured representation (list, DAG, or hierarchical task network). The workflow engine executes the plan step by step, feeding results back to the reasoning engine, which may request a replan.
The Agent Planning Lifecycle​
Stage 1: Goal Understanding​
Parse the user request into a structured goal (e.g., {type: "book_flight", constraints: {max_price:500, origin:"NYC", destination:"LON", date:"2025-06-15"}}).
Stage 2: Task Decomposition​
Break the goal into atomic tasks. Example: search_flights → filter_by_price → select_best_option → book_flight.
Stage 3: Action Generation​
For each task, define the required action. This may be a tool call, a sub‑plan, or a conditional branch.
Stage 4: Tool Selection​
Map actions to concrete tools from the agent’s registry. The planner must respect tool schemas (input/output types).
Stage 5: Execution (by workflow engine)​
The plan is passed to the workflow engine, which executes steps sequentially or in parallel.
Stage 6: Validation​
After each step, the reasoning engine checks if the outcome matches expectations. If not, it triggers replanning.
Stage 7: Replanning​
The planner generates an updated plan from the current state, possibly skipping completed steps or choosing alternative tools.
Planning Strategies​
| Strategy | Description | When to Use | Example |
|---|---|---|---|
| Single‑step planning | One action, no sequencing. | Trivial tasks, single tool. | “What’s the weather?” → call get_weather. |
| Multi‑step planning | Linear sequence of actions. | Predictable, ordered tasks. | “Create user → send welcome email → log event”. |
| Hierarchical planning | Parent plan with sub‑plans (tasks within tasks). | Complex goals with reusable subtasks. | “Write report” → subtask “gather data” → sub‑subtask “query DB”. |
| Dynamic planning | Plan is generated incrementally as execution proceeds. | Unknown tool outputs; exploration. | Research agent that reads a document, then decides next source. |
| Replanning | Modify existing plan based on new information or failures. | Unreliable APIs, user interrupts, conditional logic. | Price tool returns higher than expected → plan to look for discounts. |
| Conditional planning | Plans with branches (if‑then‑else). | Decisions depend on tool results. | “If stock > 0 then order else notify me”. |
Planning and Tool Calling​
The planner drives tool selection, but the two are distinct: planning decides what to do; tool calling executes how.
Example integration:
// Plan generated by planner
{
"plan_id": "plan_001",
"steps": [
{"id": 1, "action": "search_flights", "params": {"origin": "NYC", "dest": "LON", "date": "2025-06-15"}},
{"id": 2, "action": "filter_flights", "params": {"max_price": 500}, "depends_on": [1]},
{"id": 3, "action": "book_flight", "params": {"flight_id": "$step2.result.selected_id"}, "depends_on": [2], "condition": "step2.result.has_flights"}
]
}
Responsibilities:
| Component | Role |
|---|---|
| Planner | Chooses tool names and fills required parameters (using placeholders for results of previous steps). |
| Tool calling layer | Validates parameters against tool schema, executes, returns structured result. |
| Workflow engine | Resolves dependencies, injects previous step outputs into parameters. |
Common mistake: The planner assumes a tool will succeed and does not include fallback tools. Always add conditional branches (e.g., if search_flights fails, try search_alternative_flights).
Planning and Memory​
Planning heavily depends on memory – both for context and for storing the plan itself.
- Short‑term memory – Provides recent conversation context (e.g., user previously said “I prefer early flights”). The planner must consider this when generating steps.
- Long‑term memory – Stores successful plans for similar goals. A planner can retrieve a plan template instead of generating from scratch.
- Plan store – A dedicated state component that holds the current plan, which steps are completed, their outputs, and any pending replan requests.
Context window consideration: The plan itself must be kept small. A plan with 20 steps can exceed token limits if serialised naively. Store plans in the state manager and only inject the next step into the LLM context.
Planning Patterns​
Plan‑and‑Execute​
The classic pattern: generate full plan upfront, then execute each step sequentially.
Advantages: Simple, transparent, easy to debug.
Disadvantages: Cannot adapt to unexpected tool outputs; if step 2 fails, the rest of the plan may be useless.
Implementation: One LLM call for planning, then a loop.
ReAct (Reasoning + Acting)​
Interleaves planning and execution: at each turn, the LLM outputs a thought (reasoning), then an action, then observes the result, and repeats.
Advantages: Highly adaptive, good for exploration.
Disadvantages: More LLM calls (higher cost, latency), can loop.
Implementation: Prompt instructs model to output Thought: ... Action: tool_name[params].
Tree of Thoughts (ToT)​
Explores multiple reasoning paths in parallel, evaluating each branch, then chooses the best.
Advantages: Better for complex reasoning (math, puzzles).
Disadvantages: Very high token cost, complex to implement.
Implementation: Multiple LLM calls to generate candidate next steps, score them, and prune.
Reflection / Self‑Correction​
After plan execution (or after each step), the agent reflects on the outcome and refines the plan for the next iteration.
Advantages: Improves over time, can learn from mistakes.
Disadvantages: Requires storing feedback in memory.
Implementation: After plan completion, call LLM with prompt: “Given the outcome, how could the plan be improved?”
Workflow‑Based Planning​
A hybrid approach: a fixed workflow DAG defines the high‑level steps, but each node contains an agent with local planning (e.g., a “research” node that uses ReAct).
Advantages: Combines predictability with flexibility.
Disadvantages: More complex to design.
Implementation: Use LangGraph with conditional edges and sub‑graphs.
| Pattern | LLM Calls per Step | Adaptivity | Cost | Best for |
|---|---|---|---|---|
| Plan‑and‑Execute | 1 (plan) + 1 (final) | Low | Low | Well‑understood, deterministic tasks |
| ReAct | 2+ per action | High | Medium | Dynamic tasks, tool use |
| Tree of Thoughts | Many | Very high | High | Reasoning‑heavy, open‑ended |
| Reflection | Plan + reflection | Medium | Medium | Iterative improvement |
| Workflow‑based | Varies | Medium | Medium | Enterprise processes with sub‑tasks |
Planning in Popular Agent Frameworks​
| Framework | Planning Model | Strengths | Weaknesses | Use Case |
|---|---|---|---|---|
| LangGraph | Graph‑based + conditional edges; planner can be any node. | Full control, checkpointing, replanning as graph cycle. | Steeper learning curve. | Complex, long‑running workflows. |
| CrewAI | Sequential or hierarchical tasks; built‑in Planner agent. | Simple, role‑based planning. | Limited dynamic replanning; not good for loops. | Linear business processes. |
| AutoGen | Conversation‑driven; agents plan via chat. | Natural multi‑agent planning. | Inefficient for pure plan‑execute; high token usage. | Multi‑agent deliberation. |
| OpenAI Agents SDK | Handoff‑based; no explicit planner – each agent has a single step. | Simple, low latency. | No multi‑step planning within one agent. | Single‑turn tool use. |
| Semantic Kernel | Plan object with steps; sequential planner via LLM. | Good for .NET/Java, enterprise. | Basic planning, no built‑in replanning. | Simple automation. |
Recommendation: For production systems that require robust planning (replanning, conditional branching, human‑in‑the‑loop), use LangGraph. For quick prototypes with linear plans, CrewAI or Semantic Kernel suffice.
Production Planning Challenges​
| Challenge | Description | Mitigation |
|---|---|---|
| Hallucinated plans | LLM invents non‑existent tools or steps. | Validate plan against tool registry; reject plans with unknown actions. |
| Excessive tool usage | Plan includes redundant or inefficient steps. | Limit plan depth (max 10 steps). Use plan caching. |
| Infinite loops | Replanning never converges. | Set max replan attempts (3). Detect repeated states. |
| Cost escalation | Planning itself requires LLM calls. Each replan adds cost. | Use cheaper model for planning (e.g., GPT‑3.5). Cache identical plans. |
| Context drift | Over many turns, the plan grows and overwhelms context. | Summarise completed steps; only keep next step and dependencies in prompt. |
| Dependency failures | A tool fails, breaking all downstream steps. | Include fallback steps in plan (e.g., if tool A fails, use tool B). |
Evaluating Agent Planning​
You cannot improve what you do not measure. Evaluate planning separately from execution.
| Metric | Definition | How to Measure |
|---|---|---|
| Plan accuracy | % of generated plans that are valid (all tools exist, parameters correct). | Parse plan JSON; check tool registry; count errors. |
| Task completion rate | % of user goals fully satisfied by following the plan. | End‑to‑end test with expected outcome. |
| Plan efficiency | Number of steps taken vs. optimal (human‑curated) plan. | Compare step count; compute overhead ratio. |
| Replan success rate | When a step fails, replan leads to success vs. total replan attempts. | Trace logs: replan_triggered → final_success. |
| Cost per plan | Total tokens (LLM calls) used during planning phase. | Accumulate token usage for planning prompts. |
| Planning latency | Time from goal input to first executable step. | Measure time before tool execution begins. |
Example evaluation set:
| Goal | Expected plan | Generated plan | Correct? | Steps | Tokens |
|---|---|---|---|---|---|
| “Send email to [email protected]” | send_email([email protected]) | search_contact(john) → send_email | No (over‑complicated) | 2 vs 1 | 450 |
| “Order status for #123” | get_order_status(123) | get_order_status(123) | Yes | 1 | 210 |
Best Practices​
-
Always validate the plan before execution – Check that every tool exists, required parameters are present, and types match. Reject invalid plans.
-
Set a maximum plan depth – Hard limit of 10–15 steps. Beyond that, ask the user to refine the goal.
-
Implement plan caching – Store successful plans keyed by goal intent (embedding similarity). Retrieve and reuse.
-
Use a cheaper LLM for planning – GPT‑3.5‑turbo or Llama 3 8B is often enough; reserve GPT‑4 for final reasoning.
-
Include conditional branches – Teach the planner to output
conditionfields (e.g.,if result.success else ...). -
Replan only when necessary – Not every tool failure needs a replan. Transient errors: retry. Schema errors: replan.
-
Keep the plan in state, not in prompts – Store plan in a structured object; inject only the next step into the LLM context.
-
Monitor planning failures – Alert when plan validation fails more than 5% of requests. That signals prompt or tool registry issues.
-
Test planning in isolation – Mock tool execution; verify that the planner produces correct plans for a suite of goals.
-
Provide tool documentation in the planning prompt – Include tool names, descriptions, parameter schemas, and example calls. The better the documentation, the better the plan.
-
Version your planning prompts – Changes to prompts can drastically change plan quality. A/B test new prompts before full rollout.
-
Log every plan – Save the generated plan, the final executed steps, and any replans. This is gold for debugging.
Common Planning Mistakes​
| Mistake | Consequence | Fix |
|---|---|---|
| Overly complex plans | High latency, token cost, more failure points. | Limit depth; prefer atomic steps. |
| No replanning | Agent fails when first plan encounters an unexpected tool result. | Implement max_replans and a feedback loop. |
| Unlimited tool execution | Plan can loop forever. | Set iteration limit, detect repeated states. |
| Ignoring memory quality | Plan repeats steps because memory retrieval is poor. | Improve memory recall; test with relevant history. |
| No evaluation framework | Cannot tell if planning improvements actually help. | Build offline test suite before writing planner. |
| Hardcoding tool names | Changing tool schema breaks all plans. | Use tool registry and dynamic lookup. |
| Planning as an afterthought | Agent behaves reactively, never decomposing goals. | Start with plan‑and‑execute pattern, then refine. |
Case Study: Enterprise Research Agent​
Domain: Competitive intelligence. An analyst asks the agent: “Gather information on Acme Corp’s latest product launch, their pricing strategy, and any customer complaints from the last 3 months. Produce a summary with citations.”
Initial Plan (generated by planner)​
{
"steps": [
{"action": "web_search", "params": {"query": "Acme Corp product launch 2026"}},
{"action": "web_search", "params": {"query": "Acme Corp pricing strategy"}},
{"action": "web_search", "params": {"query": "Acme Corp customer complaints after:2026-03-01"}},
{"action": "extract_dates", "params": {"source": "$step1.result"}},
{"action": "sentiment_analysis", "params": {"text": "$step3.result"}},
{"action": "generate_summary", "params": {"sources": ["$step1","$step2","$step3"], "insights": ["$step4","$step5"]}}
]
}
Execution and Replanning​
- Step 1 succeeds → returns three product launch articles.
- Step 2 succeeds → pricing page found.
- Step 3 fails → web search returns no complaints (maybe Acme has a different name for support).
Replan triggered. The planner replaces step 3 with:
"action": "search_trustpilot", "params": {"company": "Acme Corp"}} - New step succeeds → returns complaints.
- Step 4 & 5 execute.
- Step 6 generates final summary with citations.
Optimisation Opportunities​
- Plan caching – Future “gather info on
{competitor}” requests reuse the same plan template, reducing LLM calls. - Parallel execution – Steps 1,2,3 are independent. The workflow engine can execute them concurrently (cuts latency from 9s to 3s).
- Tool‑specific fallbacks – Instead of replanning, the planner could have included
fallback_toolfield in the original step.
Result: The agent completed the research in 12 seconds, used 5 tool calls, 2 planning LLM calls (initial + replan), cost $0.08. Without planning, a single LLM call would have tried to answer from memory and hallucinated.
FAQ​
1. Is planning always necessary for an AI agent?
No. For single‑step tool calls (e.g., “what’s the weather”), planning is overkill. Use planning only when the task requires multiple tools or conditional logic.
2. What is the difference between planning and workflows?
Planning generates the sequence dynamically. Workflows are pre‑defined DAGs. A workflow agent may have no planning component; it simply executes a fixed path. Planning is for adaptive systems.
3. When should an agent replan?
Replan when: (a) a tool fails with a recoverable error (e.g., “no results” → try alternative tool), (b) a tool returns unexpected data that changes the goal, or (c) the user interrupts with new instructions.
4. How deep should planning trees be?
For most tasks, 5–10 steps. Deeper plans (>15) are hard for LLMs to generate correctly and expensive to execute. Decompose large goals into multiple agent turns.
5. Does LangGraph support planning?
Yes, implicitly. LangGraph does not have a separate “planner” node, but you can implement planning as a graph node that calls an LLM to produce a plan, then use conditional edges to execute steps and loop back to the planner for replanning.
6. Can I use planning without LLMs?
Classic AI planning (STRIPS, PDDL) works for deterministic domains. But for natural language goals, LLM‑based planning is far more practical.
7. How do I prevent the planner from inventing non‑existent tools?
Provide a strict list of available tools in the planning prompt, with names and parameter schemas. Also post‑validate the plan against the registry.
8. What is the cost overhead of planning?
Each planning LLM call adds tokens. For a typical 5‑step plan, planning might consume 20–30% of total tokens. Use a cheaper model for planning to reduce cost.
9. How do I evaluate a planner offline?
Create a dataset of (goal, expected plan). Run your planner on each goal, compare generated plan to expected plan (exact match or semantic similarity). Measure success rate and token usage.
10. Can planning be done in parallel for multiple goals?
Yes, if your agent handles batch requests. However, each goal should have its own plan and state. Do not interleave steps from different plans.
11. What is the role of memory in replanning?
Memory stores the outcomes of previous steps. The replanner must access that memory to decide which steps to skip, retry, or replace.
12. How does human‑in‑the‑loop affect planning?
The plan can include a wait_for_human action. The planner must understand that such steps pause execution indefinitely and require resumption with new input.
13. Is Tree of Thoughts production‑ready?
Rarely. The token cost is high, and latency is unpredictable. ToT is mostly for research. Production systems use ReAct or Plan‑and‑Execute.
14. Can a planner learn from past successes?
Yes, by storing successful plans in long‑term memory and retrieving them via similarity search. This is called “plan reuse” or “case‑based planning.”
15. What is the single biggest mistake teams make with planning?
Building a planner that never replans. When a tool returns an unexpected result, the agent blindly continues or crashes. Always include a replanning loop.
Continue Your Journey​
Now that you understand how agents plan, explore the other core components that work alongside planning:
- Memory – Agent Memory (plans depend on retrieved context)
- Tool Calling – Tool Calling (executing the plan’s actions)
- Workflows – Agent Workflows (orchestrating plan steps)
- Frameworks – LangGraph Guide (implementing planners in production)
- Evaluation – Agent Evaluation (measuring plan quality)
Or return to the Agent Learning Path to see where planning fits in your roadmap.
This article is part of the AgentDevPro Production Agent Engineering Handbook. Updated for Q2 2026.