Intermediate

ragretrievalagentsknowledgellm

Agentic RAG

Standard RAG searches once and hopes for the best. This pattern replaces the fixed pipeline with a decision-making agent that picks the right search tool for each question, checks whether it found the right answer, and searches again when it is not confident.

Used in

Agentic RAG Architecture

💬

User Query

Multi-hop question

service

Complex, ambiguous user request

Layer 04 · Agentic

Routing Agent

Classify + pick strategy

Determines the best retrieval approach

✍️

Query Rewriter

Iterative refinement

service

Optimizes prompt for search index

📝

Plan Scratchpad

Tracks prior searches

service

Maintains state across iterations

🎯

Reranker

Cohere / bge-reranker

service

Boosts relevance of initial hits

🔮

Vector DB

Pinecone / Weaviate

storage

Semantic similarity search

🗂️

SQL Engine

Structured queries

storage

Exact match and aggregations

🕸️

Knowledge Graph

Neo4j traversal

storage

Entity relationship paths

🌐

Web Search

Real-time lookup

service

External API for live facts

Layer 04 · Agentic

Validator Agent

Citation + hallucination check

Ensures facts match retrieved context

Layer 04 · Agentic

Generator

Grounded answer

Synthesizes final response text

✅

Answer + Citations

Returned to user

storage

Response with verifiable sources

Pan, zoom, and explore. Click export to download as PNG.

Interactive diagram — pan, zoom, and explore. Click export to download as PNG.

Why a single vector search is not enough

Routing & Planning Agent

Multi-Source Retrievers

Validator & Self-Correction

When to Use This Pattern

Trade-offs

📐The failure modes of classic RAG — and how an agent fixes them

Why a single vector search is not enough

Retrieval-Augmented Generation (RAG) was invented to solve a core problem with language models: they do not know about your data, and their knowledge has a cutoff date. The fix is to search your data at query time and include the results in the prompt so the model can answer questions grounded in real information.

The simplest version works fine for easy questions: embed the question, find similar documents in a vector database, include them in the prompt, generate an answer. But it breaks on questions that require more than one step — "What changed between our Q2 and Q3 reports?" — or questions where the first vector search returns irrelevant results because the phrasing did not match the indexed content. It also breaks when the answer needs both a database query for structured numbers and a document search for context around those numbers.

Agentic RAG replaces the fixed linear pipeline with a reasoning loop. Instead of always doing "vector search then generate," an agent first thinks about what kind of question this is and what kind of search would actually help. It picks from a set of retrieval tools — dense vector search, SQL, knowledge graph traversal, live web search — and can use several in sequence for complex questions. A separate validation step checks whether the retrieved information actually answers the question. If not, the agent reformulates the query and tries again.

Next up:Routing & Planning Agent

🔧Deciding what to search, where, and in what order

Routing & Planning Agent

The routing agent is the entry point. Its job is to look at a question and decide: what kind of information does this need, where does that information live, and should I search one source or several in sequence? For a simple factual question it might go directly to vector search. For a financial comparison it might query a structured database first for the raw numbers, then search documents for context. For a recent event it might skip the internal knowledge base entirely and go to web search. It also manages the iteration cycle — if a search returned empty or irrelevant results, the routing agent rewrites the query and tries a different approach rather than giving up.

Routing & Planning Agent

💬

Incoming Query

Multi-hop question

service

Layer 04 · Agentic

Routing Agent

Classify + decide

✍️

Query Rewriter

Iterative reform

service

📝

Plan Scratchpad

Prior searches

service

🎯

Reranker

Cohere / bge

service

🛑

Max-Iteration Cap

Hard stop

service

Pan, zoom, and explore. Click export to download as PNG.

✍️

Query Rewriter

When the first search does not return useful results, the rewriter reformulates the query. It might break a complex question into simpler sub-questions, add missing context, or switch from natural-language phrasing to more precise search terms.

📝

Plan Scratchpad

A running log of what has already been searched and what results each search returned. This prevents the agent from running the same search twice and helps it understand what information is still missing.

🎯

Reranker

After retrieval returns a list of candidate results, the reranker re-scores them based on their actual relevance to the specific question — not just their embedding similarity score. This significantly improves what gets passed to the answer generator.

🧠

Max-Iteration Cap

A hard limit on how many times the agent is allowed to retry before returning a best-effort answer. Without this, an ambiguous question could trigger an infinite loop of searches.

Next up:Multi-Source Retrievers

🔧Using the right storage system for each type of question

Multi-Source Retrievers

A major limitation of simple RAG is assuming all knowledge lives in one place — a vector database full of embedded documents. Real organizations have knowledge spread across many different types of storage, each of which is the best tool for a different kind of question. Agentic RAG exposes each storage system as a tool that the routing agent can call. The agent picks the right tool for the query, or calls several in sequence if the question spans multiple sources.

Multi-Source Retrievers

Layer 04 · Agentic

Router

Picks tool

🔮

Vector DB

Pinecone / Weaviate / Qdrant

storage

🗂️

SQL Engine

Warehouse / OLTP

storage

🕸️

Knowledge Graph

Neo4j GraphRAG

storage

🌐

Web Search

Live facts

service

🧬

Context Merger

Dedupe + rank

service

Pan, zoom, and explore. Click export to download as PNG.

🔮

Vector DB

Best for questions about unstructured text — documents, emails, meeting notes, support tickets. Returns results based on semantic similarity rather than exact keyword matching. Works well when you know roughly what you're looking for but not the exact words.

🗂️

SQL Engine

Best for precise, structured questions — sales figures, user counts, date ranges, aggregations. The agent writes a query (or uses a text-to-SQL layer) and gets an exact answer rather than a similarity score.

🕸️

Knowledge Graph

Best for relationship questions — "how is concept A related to concept B?", "which customers share the same account manager?" Graphs can traverse multi-hop connections that a flat vector index cannot follow.

🌐

Web Search

Best for recent events, fast-changing information, or topics not covered by internal sources. Results are live and current, but require more careful validation since they are unvetted.

Next up:Validator & Self-Correction

🔧Checking the answer before it reaches the user

Validator & Self-Correction

The most important question in any RAG system is not "did I find something?" — it is "did I find the right thing?" A vector database will always return results; whether those results actually answer the question is a separate matter. Without a validation step, the system can confidently generate an answer that sounds plausible but is wrong, because the model filled in gaps from its training data rather than from the retrieved context. A validator agent reads the retrieved chunks and the draft answer together and asks: does the answer follow from what was actually retrieved? Is there anything in the sources that contradicts the answer? Are all the claims traceable to a specific retrieved passage, or did the model invent some of them?

Validator & Self-Correction

Layer 04 · Agentic

Draft Answer

From generator

🔎

Citation Checker

Claim → source span

service

⚖️

Contradiction Detector

Answer vs sources

service

📊

Confidence Scorer

Model-derived

service

🔁

Re-Retrieval Loop

Back to rewriter

service

✅

Ship Answer

Grounded + cited

storage

Pan, zoom, and explore. Click export to download as PNG.

🔎

Citation Checker

Every statement in the final answer should be traceable to a specific retrieved chunk. The citation checker builds this mapping and flags any claim that is not grounded in the retrieved context — those claims are likely hallucinated.

⚖️

Contradiction Detector

Sometimes the model's draft answer contradicts what the retrieved sources actually say — it generated a plausible-sounding but wrong answer despite having the right context. The contradiction detector catches these before they reach the user.

📊

Confidence Scorer

A numerical estimate of how well the retrieved context covers the question. Below a threshold, the system does not trust the answer enough to return it — it triggers another retrieval round instead of giving the user a low-confidence answer.

🔁

Re-Retrieval Loop

When confidence is too low, the agent does not give up — it tries again with a reformulated query, a different tool, or additional context. The loop exits when confidence is high enough, or the iteration cap is reached.

Next up:When to Use This Pattern

🎯Signs this is the right architecture for your situation

When to Use This Pattern

Questions span multiple types of data — documents, structured databases, relationship graphs, or live information

Simple vector search consistently returns results that do not actually answer the question

Your domain requires verifiable, cited answers — medical, legal, financial, or compliance use cases where hallucinations are unacceptable

Your knowledge base changes frequently or is too large and varied for a single static index to cover well

Next up:Trade-offs

⚖️What you gain — and what it costs

Trade-offs

Benefit

Cost

Significantly higher accuracy on multi-step or multi-source questions

Each retrieval round adds LLM calls — latency and cost per question are higher than simple RAG

Retrieval strategy adapts to the specific question rather than following a fixed pipeline

A bad routing decision early in the loop can cascade into a wrong answer even if validation catches it later

Self-correction catches many hallucinations before they reach the user

Without a hard iteration cap, difficult or ambiguous questions can trigger excessive retries

Any data source can be added as a retrieval tool without redesigning the pipeline

Non-deterministic loops make testing and evaluation harder than a simple fixed pipeline