Intermediate
ragretrievalagentsknowledgellm

Agentic RAG

Standard RAG searches once and hopes for the best. This pattern replaces the fixed pipeline with a decision-making agent that picks the right search tool for each question, checks whether it found the right answer, and searches again when it is not confident.

Used in
Agentic RAG Architecture
Mini Map
Pan, zoom, and explore. Click export to download as PNG.

Interactive diagram โ€” pan, zoom, and explore. Click export to download as PNG.

๐Ÿ“The failure modes of classic RAG โ€” and how an agent fixes them

Why a single vector search is not enough

Retrieval-Augmented Generation (RAG) was invented to solve a core problem with language models: they do not know about your data, and their knowledge has a cutoff date. The fix is to search your data at query time and include the results in the prompt so the model can answer questions grounded in real information.

The simplest version works fine for easy questions: embed the question, find similar documents in a vector database, include them in the prompt, generate an answer. But it breaks on questions that require more than one step โ€” "What changed between our Q2 and Q3 reports?" โ€” or questions where the first vector search returns irrelevant results because the phrasing did not match the indexed content. It also breaks when the answer needs both a database query for structured numbers and a document search for context around those numbers.

Agentic RAG replaces the fixed linear pipeline with a reasoning loop. Instead of always doing "vector search then generate," an agent first thinks about what kind of question this is and what kind of search would actually help. It picks from a set of retrieval tools โ€” dense vector search, SQL, knowledge graph traversal, live web search โ€” and can use several in sequence for complex questions. A separate validation step checks whether the retrieved information actually answers the question. If not, the agent reformulates the query and tries again.

Next up:Routing & Planning Agent
๐Ÿ”งDeciding what to search, where, and in what order

Routing & Planning Agent

The routing agent is the entry point. Its job is to look at a question and decide: what kind of information does this need, where does that information live, and should I search one source or several in sequence? For a simple factual question it might go directly to vector search. For a financial comparison it might query a structured database first for the raw numbers, then search documents for context. For a recent event it might skip the internal knowledge base entirely and go to web search. It also manages the iteration cycle โ€” if a search returned empty or irrelevant results, the routing agent rewrites the query and tries a different approach rather than giving up.

Routing & Planning Agent
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
โœ๏ธ

Query Rewriter

When the first search does not return useful results, the rewriter reformulates the query. It might break a complex question into simpler sub-questions, add missing context, or switch from natural-language phrasing to more precise search terms.

๐Ÿ“

Plan Scratchpad

A running log of what has already been searched and what results each search returned. This prevents the agent from running the same search twice and helps it understand what information is still missing.

๐ŸŽฏ

Reranker

After retrieval returns a list of candidate results, the reranker re-scores them based on their actual relevance to the specific question โ€” not just their embedding similarity score. This significantly improves what gets passed to the answer generator.

๐Ÿง 

Max-Iteration Cap

A hard limit on how many times the agent is allowed to retry before returning a best-effort answer. Without this, an ambiguous question could trigger an infinite loop of searches.

Next up:Multi-Source Retrievers
๐Ÿ”งUsing the right storage system for each type of question

Multi-Source Retrievers

A major limitation of simple RAG is assuming all knowledge lives in one place โ€” a vector database full of embedded documents. Real organizations have knowledge spread across many different types of storage, each of which is the best tool for a different kind of question. Agentic RAG exposes each storage system as a tool that the routing agent can call. The agent picks the right tool for the query, or calls several in sequence if the question spans multiple sources.

Multi-Source Retrievers
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
๐Ÿ”ฎ

Vector DB

Best for questions about unstructured text โ€” documents, emails, meeting notes, support tickets. Returns results based on semantic similarity rather than exact keyword matching. Works well when you know roughly what you're looking for but not the exact words.

๐Ÿ—‚๏ธ

SQL Engine

Best for precise, structured questions โ€” sales figures, user counts, date ranges, aggregations. The agent writes a query (or uses a text-to-SQL layer) and gets an exact answer rather than a similarity score.

๐Ÿ•ธ๏ธ

Knowledge Graph

Best for relationship questions โ€” "how is concept A related to concept B?", "which customers share the same account manager?" Graphs can traverse multi-hop connections that a flat vector index cannot follow.

๐ŸŒ

Web Search

Best for recent events, fast-changing information, or topics not covered by internal sources. Results are live and current, but require more careful validation since they are unvetted.

Next up:Validator & Self-Correction
๐Ÿ”งChecking the answer before it reaches the user

Validator & Self-Correction

The most important question in any RAG system is not "did I find something?" โ€” it is "did I find the right thing?" A vector database will always return results; whether those results actually answer the question is a separate matter. Without a validation step, the system can confidently generate an answer that sounds plausible but is wrong, because the model filled in gaps from its training data rather than from the retrieved context. A validator agent reads the retrieved chunks and the draft answer together and asks: does the answer follow from what was actually retrieved? Is there anything in the sources that contradicts the answer? Are all the claims traceable to a specific retrieved passage, or did the model invent some of them?

Validator & Self-Correction
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
๐Ÿ”Ž

Citation Checker

Every statement in the final answer should be traceable to a specific retrieved chunk. The citation checker builds this mapping and flags any claim that is not grounded in the retrieved context โ€” those claims are likely hallucinated.

โš–๏ธ

Contradiction Detector

Sometimes the model's draft answer contradicts what the retrieved sources actually say โ€” it generated a plausible-sounding but wrong answer despite having the right context. The contradiction detector catches these before they reach the user.

๐Ÿ“Š

Confidence Scorer

A numerical estimate of how well the retrieved context covers the question. Below a threshold, the system does not trust the answer enough to return it โ€” it triggers another retrieval round instead of giving the user a low-confidence answer.

๐Ÿ”

Re-Retrieval Loop

When confidence is too low, the agent does not give up โ€” it tries again with a reformulated query, a different tool, or additional context. The loop exits when confidence is high enough, or the iteration cap is reached.

Next up:When to Use This Pattern
๐ŸŽฏSigns this is the right architecture for your situation

When to Use This Pattern

Questions span multiple types of data โ€” documents, structured databases, relationship graphs, or live information
Simple vector search consistently returns results that do not actually answer the question
Your domain requires verifiable, cited answers โ€” medical, legal, financial, or compliance use cases where hallucinations are unacceptable
Your knowledge base changes frequently or is too large and varied for a single static index to cover well
Next up:Trade-offs
โš–๏ธWhat you gain โ€” and what it costs

Trade-offs

Benefit
Cost
Significantly higher accuracy on multi-step or multi-source questions
Each retrieval round adds LLM calls โ€” latency and cost per question are higher than simple RAG
Retrieval strategy adapts to the specific question rather than following a fixed pipeline
A bad routing decision early in the loop can cascade into a wrong answer even if validation catches it later
Self-correction catches many hallucinations before they reach the user
Without a hard iteration cap, difficult or ambiguous questions can trigger excessive retries
Any data source can be added as a retrieval tool without redesigning the pipeline
Non-deterministic loops make testing and evaluation harder than a simple fixed pipeline