Agentic RAG
Standard RAG searches once and hopes for the best. This pattern replaces the fixed pipeline with a decision-making agent that picks the right search tool for each question, checks whether it found the right answer, and searches again when it is not confident.
Interactive diagram โ pan, zoom, and explore. Click export to download as PNG.
Why a single vector search is not enough
Retrieval-Augmented Generation (RAG) was invented to solve a core problem with language models: they do not know about your data, and their knowledge has a cutoff date. The fix is to search your data at query time and include the results in the prompt so the model can answer questions grounded in real information.
The simplest version works fine for easy questions: embed the question, find similar documents in a vector database, include them in the prompt, generate an answer. But it breaks on questions that require more than one step โ "What changed between our Q2 and Q3 reports?" โ or questions where the first vector search returns irrelevant results because the phrasing did not match the indexed content. It also breaks when the answer needs both a database query for structured numbers and a document search for context around those numbers.
Agentic RAG replaces the fixed linear pipeline with a reasoning loop. Instead of always doing "vector search then generate," an agent first thinks about what kind of question this is and what kind of search would actually help. It picks from a set of retrieval tools โ dense vector search, SQL, knowledge graph traversal, live web search โ and can use several in sequence for complex questions. A separate validation step checks whether the retrieved information actually answers the question. If not, the agent reformulates the query and tries again.
Routing & Planning Agent
The routing agent is the entry point. Its job is to look at a question and decide: what kind of information does this need, where does that information live, and should I search one source or several in sequence? For a simple factual question it might go directly to vector search. For a financial comparison it might query a structured database first for the raw numbers, then search documents for context. For a recent event it might skip the internal knowledge base entirely and go to web search. It also manages the iteration cycle โ if a search returned empty or irrelevant results, the routing agent rewrites the query and tries a different approach rather than giving up.
Query Rewriter
When the first search does not return useful results, the rewriter reformulates the query. It might break a complex question into simpler sub-questions, add missing context, or switch from natural-language phrasing to more precise search terms.
Plan Scratchpad
A running log of what has already been searched and what results each search returned. This prevents the agent from running the same search twice and helps it understand what information is still missing.
Reranker
After retrieval returns a list of candidate results, the reranker re-scores them based on their actual relevance to the specific question โ not just their embedding similarity score. This significantly improves what gets passed to the answer generator.
Max-Iteration Cap
A hard limit on how many times the agent is allowed to retry before returning a best-effort answer. Without this, an ambiguous question could trigger an infinite loop of searches.
Multi-Source Retrievers
A major limitation of simple RAG is assuming all knowledge lives in one place โ a vector database full of embedded documents. Real organizations have knowledge spread across many different types of storage, each of which is the best tool for a different kind of question. Agentic RAG exposes each storage system as a tool that the routing agent can call. The agent picks the right tool for the query, or calls several in sequence if the question spans multiple sources.
Vector DB
Best for questions about unstructured text โ documents, emails, meeting notes, support tickets. Returns results based on semantic similarity rather than exact keyword matching. Works well when you know roughly what you're looking for but not the exact words.
SQL Engine
Best for precise, structured questions โ sales figures, user counts, date ranges, aggregations. The agent writes a query (or uses a text-to-SQL layer) and gets an exact answer rather than a similarity score.
Knowledge Graph
Best for relationship questions โ "how is concept A related to concept B?", "which customers share the same account manager?" Graphs can traverse multi-hop connections that a flat vector index cannot follow.
Web Search
Best for recent events, fast-changing information, or topics not covered by internal sources. Results are live and current, but require more careful validation since they are unvetted.
Validator & Self-Correction
The most important question in any RAG system is not "did I find something?" โ it is "did I find the right thing?" A vector database will always return results; whether those results actually answer the question is a separate matter. Without a validation step, the system can confidently generate an answer that sounds plausible but is wrong, because the model filled in gaps from its training data rather than from the retrieved context. A validator agent reads the retrieved chunks and the draft answer together and asks: does the answer follow from what was actually retrieved? Is there anything in the sources that contradicts the answer? Are all the claims traceable to a specific retrieved passage, or did the model invent some of them?
Citation Checker
Every statement in the final answer should be traceable to a specific retrieved chunk. The citation checker builds this mapping and flags any claim that is not grounded in the retrieved context โ those claims are likely hallucinated.
Contradiction Detector
Sometimes the model's draft answer contradicts what the retrieved sources actually say โ it generated a plausible-sounding but wrong answer despite having the right context. The contradiction detector catches these before they reach the user.
Confidence Scorer
A numerical estimate of how well the retrieved context covers the question. Below a threshold, the system does not trust the answer enough to return it โ it triggers another retrieval round instead of giving the user a low-confidence answer.
Re-Retrieval Loop
When confidence is too low, the agent does not give up โ it tries again with a reformulated query, a different tool, or additional context. The loop exits when confidence is high enough, or the iteration cap is reached.