Recursive AI Tool Loop
A production-proven pattern for building AI systems that analyze data, decide what they need, fetch it, and refine their own answer — without human intervention. The loop is self-terminating: the AI reports its own confidence, and the orchestrator decides when to stop.
Deep Dive Available — Built by Coulee Tech
Coulee Tech built a production AI Employee system on this pattern with 91 tools and multiple specialized Employees. Their detailed case study is available as a 6-part deep dive with annotated code, flowcharts, and AI thinking callouts.
What This System Does
Given a piece of work (a support ticket, sales lead, code review, patient intake — any domain), the system:
- Reads the work item and its metadata
- Guesses which external data sources would help, and pre-fetches them
- Analyzes the work item using an LLM, with pre-fetched data injected into the prompt
- Decides whether it has enough information or needs more
- Fetches additional data using tools the LLM requested
- Re-analyzes with new data merged in, updating only the sections that changed
- Repeats steps 4–6 until confident, out of budget, or at the pass limit
- Saves the final analysis and makes it available to the human
Architecture Overview
Loop State Machine
┌──────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR │
│ │
│ 1. Pre-Enrichment ─────────────────────────────────────────┐ │
│ │ Lightweight AI call picks which tools to run │ │
│ │ Selected tools execute in parallel │ │
│ │ Results injected into Pass 1 system prompt │ │
│ ▼ │ │
│ 2. Pass 1: Main Analysis ──────────────────────────────────┤ │
│ │ Streaming LLM call with full prompt + enriched data │ │
│ │ Response parsed into section map (living document) │ │
│ ▼ │ │
│ 3. Decision Gate ──────────────────────────────────────────┤ │
│ │ Extract confidence + ai_actions from response │ │
│ │ Check exit conditions (confident? budget? max pass?) │ │
│ │ Filter to new, non-duplicate actions │ │
│ ▼ │ │
│ 4. Tool Execution ─────────────────────────────────────────┤ │
│ │ Run required + recommended tools in parallel │ │
│ │ Each tool returns { summary, markdown, data } │ │
│ │ Markdown accumulated into enriched context │ │
│ ▼ │ │
│ 5. Refinement Pass (2, 3, ... up to 5) ───────────────────┤ │
│ │ System prompt = base rules + ALL enriched data │ │
│ │ User message = current document + tool summaries │ │
│ │ AI outputs ONLY sections that need changes │ │
│ │ Sections merged into existing document │ │
│ → Back to Decision Gate (step 3) │ │
│ ▼ │ │
│ 6. Complete ───────────────────────────────────────────────┘ │
│ Auto-save analysis to backend │
│ Restore model if escalated during the loop │
└──────────────────────────────────────────────────────────────────┘Data Flow
Work Item (ticket, order, case, etc.)
│
▼
Pre-Enrichment AI ──selects──▶ Tool Registry ──dispatches──▶ Individual Tools
│
graphqlFetch() / REST / AI sub-call
│
ToolResult {
summary, ← for UI
markdown, ← for prompt injection
data ← for structured access
}
│
Injected into LLM system prompt as contextThe Five Key Components
The Orchestrator
Manages the loop lifecycle — phases, pass counting, budget tracking, tool deduplication, model escalation, and exit conditions.
The AI Analysis Engine
Sends the LLM call (streaming), manages prompt templates, and fires a completion callback when the stream finishes.
The Tool System
A registry of async functions that fetch data from external systems. Each tool takes standardized inputs and returns a standardized result.
The Pre-Enrichment Engine
Before the main analysis starts, a lightweight AI call decides which tools to run upfront — replacing keyword matching with intent reasoning.
Smart Tools
Tools that optionally use an AI sub-call to post-process their own results before returning — filtering 40 results down to the 2-3 most relevant.
Orchestrator Exit Conditions
Checked at every Decision Gate — the loop stops when any of these are true:
| Condition | Trigger |
|---|---|
Confident | AI reports high confidence and has no required/recommended actions |
No new actions | AI requested tools, but all were already run |
Max passes | Hard limit (5 passes) to prevent infinite loops |
Stale confidence | Same confidence level for 2+ consecutive passes |
Budget exceeded | Total cost exceeds per-analysis budget ($0.10 default) |
Manual stop | User disabled auto mode or clicked abort |
Orchestrator Decision Gate (pseudocode)
on_analysis_complete(response, usage):
update living document (build or merge sections)
record pass in history
if pass >= MAX_PASSES → exit("max-passes")
if total_cost >= budget → exit("budget-exceeded")
actions = extract_ai_actions(response)
new_actions = actions.filter(not already completed)
if new_actions is empty:
if confidence == "high" → exit("confident")
else → exit("no-new-actions")
if pass >= 3 and confidence == last_confidence → exit("stale-confidence")
run_tools(new_actions)
→ on tools complete → build refinement prompt → start next passThe Tool System
Each tool is a function that fetches ONE type of data from ONE source. Think of them as sensors — each one looks at the world from a different angle.
| Type | Description | AI Cost | Example |
|---|---|---|---|
| Script tools | Pure data fetching. Call an API, format the response, return markdown. No AI involved. | $0.00 | fetch_device_info |
| Smart tools | Fetch data, then optionally use an AI sub-call to filter/rank before returning. Always has a code-based fallback. | ~$0.0003 | fetch_sop_docs |
Standardized ToolResult Interface
Tool Types
interface ToolResult {
success: boolean;
toolName: string;
data: unknown; // raw structured data (for UI or further processing)
summary: string; // short text for inline UI display
markdown: string; // full formatted output for prompt injection ← CRITICAL
error?: string;
}
interface ToolContext {
companyId: string; // scoping — which customer/account
deviceId?: string; // scoping — which asset
ticketId?: string; // scoping — which work item
contactEmail?: string;
companyName: string;
modelId?: string; // optional — enables AI sub-calls inside tools
ticketTitle?: string;
}
type ToolFunction = (
params: ToolParams,
context: ToolContext,
token: string
) => Promise<ToolResult>;Script Tool Template
import type { ToolFunction, ToolResult } from './types';
export const execute: ToolFunction = async (params, context, token) => {
const fail = (msg: string): ToolResult => ({
success: false,
toolName: 'fetch_your_thing',
data: null,
summary: msg,
markdown: `> **Your Thing:** ${msg}`,
error: msg,
});
try {
const rawData = await yourApiCall(context.companyId, params.target, token);
if (!rawData || rawData.length === 0) {
return fail('No data found');
}
const markdown = [
`## Your Thing — ${context.companyName}`,
`| Name | Status | Last Updated |`,
`|------|--------|------------- |`,
...rawData.map(item => `| ${item.name} | ${item.status} | ${item.date} |`),
].join('\n');
return {
success: true,
toolName: 'fetch_your_thing',
data: rawData,
summary: `Found ${rawData.length} items`,
markdown,
};
} catch (err) {
return fail(err instanceof Error ? err.message : 'Unknown error');
}
};Smart Tool Template (with AI sub-call)
import { chatComplete } from '@/lib/ai-call';
import type { ToolFunction } from './types';
const AI_FILTER_THRESHOLD = 5;
const FILTER_PROMPT = `You are filtering results for relevance to a work item.
Given the context and a list of results, select the 1-3 most relevant.
Respond with ONLY a JSON array: [{"index": 0, "reason": "why"}]`;
export const execute: ToolFunction = async (params, context, token) => {
const rawResults = await fetchFromApi(params, context, token);
// AI filtering when available and data is large enough
if (context.modelId && rawResults.length >= AI_FILTER_THRESHOLD) {
try {
const aiResult = await chatComplete(context.modelId, [
{ role: 'system', content: FILTER_PROMPT },
{ role: 'user', content: formatForAI(rawResults, context) },
], { maxTokens: 512, timeoutMs: 10_000 });
const selections = parseAiResponse(aiResult.content);
if (selections.length > 0) {
return buildFocusedResult(selections, rawResults);
}
} catch {
// Fall through to code-based filtering — ALWAYS have a fallback
}
}
return codeBasedResult(rawResults, params.target);
};How to Build This for Your Application
Define Your Domain
What is your “work item”? A support ticket, sales lead, code review, patient intake, financial transaction? Everything flows from this.
Answer: What data is attached at the start? What external sources could enrich it? What does “done” look like?
Design Your Tools
Each tool fetches ONE type of data from ONE source. Name them fetch_<what>. Design principles:
- Single responsibility — one tool per data source
- Scoped by context — tools receive the work item's context and scope their queries
- Graceful failure — return
{success: false, error: "..."}, never throw - Markdown output is the product — the markdown field is what the LLM sees
Build the Tool Registry
A central map of toolName → { execute, description }. Add aliases for names the LLM might generate.
Tool Registry
export const TOOLS: Record<string, ToolMeta> = {
fetch_device_info: {
name: 'fetch_device_info',
description: 'Fetch device details — hardware, OS, network, status',
execute: fetchDeviceInfo,
},
// Aliases — LLMs sometimes generate slightly different names
fetch_devices: {
name: 'fetch_device_info',
description: 'Alias for fetch_device_info',
execute: fetchDeviceInfo,
},
};
// Single dispatcher — wraps any thrown exceptions into a failed ToolResult
async function executeTool(
name: string,
params: ToolParams,
context: ToolContext,
token: string
): Promise<ToolResult> {
const tool = TOOLS[name] ?? TOOLS[Object.keys(TOOLS).find(k => k.includes(name)) ?? ''];
if (!tool) return { success: false, toolName: name, data: null, summary: 'Unknown tool', markdown: '' };
try {
return await tool.execute(params, context, token);
} catch (err) {
return { success: false, toolName: name, data: null, summary: String(err), markdown: '' };
}
}Build the Pre-Enrichment Engine
A lightweight AI call that looks at the work item and decides which tools to run BEFORE the main analysis. This replaces keyword matching — the AI reasons about intent.
Pre-Enrichment System Prompt
const PRE_ENRICHMENT_PROMPT = `
You select data-gathering tools to run before a [work item] is analyzed.
Pick only tools that would provide genuinely useful context for THIS work item.
Available tools:
- fetch_device_info: Fetch device hardware, OS, network. Use when a device is mentioned.
- fetch_backup_status: Check backup health. Use for data loss, disk, or reliability issues.
- fetch_user_info: Look up user account details. Use when a user or email is mentioned.
Respond with ONLY a JSON array:
[{"tool": "tool_name", "params": {"target": "value"}, "reason": "why"}]
`;
// Execute selected tools in parallel with a hard timeout
const results = await Promise.allSettled(
selectedTools.map(({ tool, params }) =>
Promise.race([
executeTool(tool, params, context, token),
new Promise<ToolResult>((_, reject) =>
setTimeout(() => reject(new Error('timeout')), 12_000)
),
])
)
);Build the Main Analysis Prompt
Two parts: a static system prompt (rules, tool descriptions, confidence contract) and a per-work-item user message (all metadata, linked entities, output format).
Confidence Contract (include in system prompt)
## Confidence Contract
After each analysis pass, you MUST output a confidence assessment in this JSON block:
```json
{
"analysis_confidence": "high | medium | low",
"confidence_reason": "why you chose this level",
"ai_actions": [...]
}
```
- "high" = You have specific, actionable data. Not guessing.
- "medium" = Reasonable analysis but tool data would improve it.
- "low" = Mostly guessing. Critical context is missing.
When you report "high", the loop EXITS and your answer is shown.
Do not report "high" to end the loop early if you are uncertain.Build the AI Actions Schema
The AI requests tools via a structured JSON block in its response. Define the exact schema and include it in the system prompt.
AI Actions Schema
{
"ai_actions": [
{
"action_id": "unique-kebab-case-id",
"action_type": "fetch_device_info",
"reason": "How this data would change the recommendation",
"parameters": { "target": "hostname-or-search-term" },
"priority": "required | recommended | nice_to_have"
}
]
}
// Priority rules:
// required + recommended → execute automatically
// nice_to_have → skip (reduces cost, the AI said it's not critical)Build the Refinement Prompt
For Pass 2+, the prompt changes. The system prompt gets all accumulated tool data appended. The user message contains the current document and refinement instructions.
Refinement Instructions (user message)
## Refinement Instructions
- ONLY output sections that NEED CHANGES based on the new data.
- Sections you do not output will remain unchanged.
- Use the SAME section headers as your original output.
- NEVER re-request a tool that already ran.
- Output an empty ai_actions array if all needed data is gathered.
## Tools Already Executed
- fetch_device_info (target: HOSTNAME): Success — Found device with ...
- fetch_backup_status: Success — 3 appliances, 12 agents
- fetch_azure_user (target: user@email.com): Failed — User not foundBuild the Living Document (Section Map)
The analysis is a map of typed sections, not a monolithic string. On Pass 1, build it from scratch. On Pass 2+, merge — only sections present in the new response overwrite existing ones.
Section Map Types
interface SectionState {
section: {
heading: string; // e.g. "🔍 Situation"
headingText: string; // e.g. "Situation"
type: SectionType; // enum: situation | findings | next-steps | ...
body: string;
emoji: string;
};
lastUpdatedPass: number;
lastUpdatedAt: number;
}
type SectionMap = Map<SectionType, SectionState>;
// Merge function — only update sections present in the new response
function mergeSections(existing: SectionMap, updated: SectionMap): SectionMap {
const merged = new Map(existing);
for (const [type, state] of updated) {
merged.set(type, state);
}
return merged;
}Build the Orchestrator Loop
Tie it all together. The orchestrator is a state machine with 5 critical implementation details:
- Deduplication: Track action signatures (
actionType::JSON(params)) in a Set. Never re-run the same tool with the same params. - Budget tracking: Sum
estimatedCostfrom every LLM call's usage stats. Check against budget at every Decision Gate. - Stale confidence detection: If confidence hasn't improved after 2+ passes, the tools aren't helping. Stop.
- Model escalation: If confidence is still “low” after 2+ passes, temporarily switch to a more capable model. Restore when the loop completes.
- Parallel tool execution: Always run all requested tools in parallel (
Promise.allSettled), never sequentially.
Add Safety Rails
Context blowout prevention, graceful degradation, and timeout management:
Context Blowout Prevention
- Cap each tool's markdown (4000 chars)
- Limit pre-enrichment to 5 tools
- Hard pass limit (5)
- Cost budget with escalation
Graceful Degradation
- Pre-enrichment fails → default tools
- Smart tool AI fails → code fallback
- Tool fails → return error as ToolResult
- Streaming error → show what we have
Timeout Management
- Pre-enrichment tools: 12s timeout
- AI sub-calls in smart tools: 10s
- Pre-enrichment AI picker: 8s
Context Management Deep Dive
The biggest challenge in a recursive AI loop is managing the context window. Each pass adds more data. Here's how to keep it under control:
System Prompt Growth
Pass 1 system prompt:
Base rules (~4,000 tokens)
+ Pre-enriched tool data (~2,000 tokens)
≈ 6,000 tokens
Pass 3 system prompt:
Base rules (~4,000 tokens)
+ Pre-enriched data (~2,000 tokens)
+ Pass 1 tool results (~3,000 tokens)
+ Pass 2 tool results (~3,000 tokens)
≈ 12,000 tokensUser Message Changes
Pass 1 user message:
Full work item data + notes + contacts
+ format spec
≈ 3,000–8,000 tokens
Pass 2+ user message:
Current document (section map) (~2,000)
+ Tool execution summary (~500)
+ Compact work item data (~1,500)
+ Refinement instructions (~500)
≈ 4,500 tokensReal-World Cost Analysis
Using Gemini Flash tier pricing as a reference point:
| Component | Tokens | Cost |
|---|---|---|
| Pre-enrichment AI picker | ~800 | $0.0005 |
| Pass 1 analysis | ~12,000 | $0.003 |
| Smart tool sub-call | ~1,000 | $0.0005 |
| Pass 2 refinement | ~15,000 | $0.004 |
| Pass 3 refinement | ~18,000 | $0.005 |
| Typical 2-pass analysis | — | ~$0.008 |
| Complex 4-pass analysis | — | ~$0.02 |
Adapting to Your Domain
Replace these concepts with your equivalents — the loop architecture stays the same:
| Our Concept | Your Equivalent | Examples |
|---|---|---|
Support ticket | Work item | Sales lead, code PR, patient chart, order |
Company | Account/scope | Customer, organization, project, repository |
Device | Asset | Server, product, vehicle, instrument |
Contact | Person | Lead, patient, author, stakeholder |
Ticket notes | Activity log | Comments, events, messages, lab results |
GraphQL API | Your data layer | REST API, database queries, file system |
Auth token | Your auth | API key, OAuth token, session cookie |
Quick Reference: Build Order
If starting from scratch, build in this order to get something working as fast as possible:
ToolResult type + 2 simple tools
Get data flowingTool registry + dispatcher
Centralized lookupMain analysis prompt
System prompt with confidence contractStreaming LLM call
Get Pass 1 working end-to-endAI Actions parser
Extract tool requests from responseOrchestrator loop
Wire up analyze → decision → tools → refineSection map + merge
Make refinement passes update, not replacePre-enrichment
Reduce the number of passes neededSmart tools
Add AI sub-processing to high-volume toolsBudget + safety rails
Cost tracking, dedup, stale detection