Back to Blog
AIRAGCuttingThroughTheNoiseMachineLearningTechReality

RAG Is Search Plus Paste — And the Search Is What's Broken

5 min read

RAG — Retrieval-Augmented Generation — might be the most over-engineered name in all of AI. Three syllables that make document retrieval sound like a...

RAG — Retrieval-Augmented Generation — might be the most over-engineered name in all of AI. Three syllables that make document retrieval sound like a breakthrough. But strip the acronym, and you're left with something every developer has built before: search a database, grab relevant text, paste it into a prompt. ## The Conventional Wisdom The AI industry sells RAG as the solution to hallucinations — the technique that "grounds" your AI in real data. And the marketing works. Enterprise RAG deployments nearly tripled in 2025. Every chatbot vendor, every knowledge management platform, every "AI-powered search" product has RAG somewhere in the pitch deck. That is because hallucinations are expensive. Enterprise losses from AI hallucinations hit $67.4 billion in 2024. When your legal AI hallucinates case citations — and Stanford found hallucination rates of 17-33% even with RAG in legal applications — the cost isn't just embarrassment. It's liability. And RAG genuinely helps. Industry benchmarks show RAG reduces hallucination rates by up to 71% when properly implemented. Grounded retrieval can lower hallucinations to less than 2% in summarization tasks. The technology works. ## The Contrarian Take: It's Search, Not AI Here's what most people miss: RAG has three steps, and only one of them involves AI. 1. **Retrieve** — search a database for relevant documents. This is vector similarity search, BM25 keyword matching, or a hybrid. It's a database query. 2. **Augment** — paste the retrieved text into the prompt. This is string concatenation. Literally joining text. 3. **Generate** — send the augmented prompt to the LLM. This is the same LLM call you'd make without RAG. The "retrieval" is search technology that's existed since the 1990s. The "augmentation" is paste. The "generation" is the same model call. For eg. when your chatbot answers a question about company policy, the RAG pipeline converts your question to a vector, finds the closest matching document chunks, and pastes them before the LLM prompt. The LLM never "knows" your policies — it reads them in real time, exactly like you'd read a document before answering a question. Sophisticated?? Yes. Magic?? No. ## The Evidence: Retrieval Is the Bottleneck When RAG fails — and it fails frequently — the failure point is retrieval 73% of the time, not generation. The LLM isn't confused. The search didn't find the right document. Naive RAG pipelines fail at retrieval roughly 40% of the time. That is because RAG's "intelligence" is actually a search quality problem. The bottleneck isn't the AI — it's the database. Stale documents, bad chunking strategies, missing metadata, and poor embeddings are what kill RAG pipelines. As one engineering analysis put it, "chunking is where most RAG pipelines silently fail." And the domain-specific data makes this concrete. Enterprise hallucination benchmarks show rates of 15-52% across commercial LLMs. In legal applications, Stanford found RAG still hallucinates 17-33% of the time. In medical AI, hallucination rates range from 43-64% depending on prompt quality. RAG doesn't eliminate hallucinations — it reduces them. And only when the retrieval step actually finds the right documents. ## The "RAG Is Dead" Debate Proves the Point And the "RAG is dead" conversation strengthens the argument, right?? Every few months, someone declares RAG obsolete because context windows keep getting longer — Gemini and Claude both exceed 1 million tokens. Why search and paste when you can just paste everything?? But enterprise RAG deployments nearly tripled in 2025. That is because RAG isn't a technology breakthrough — it's an engineering pattern. And engineering patterns don't die when the underlying technology improves. They evolve. The winning implementations in 2025-2026 use vector retrieval to identify relevant context, then use long-context windows to reason across that retrieved context. It was never either/or. The debate itself reveals the simplicity: if RAG were truly sophisticated AI, you wouldn't argue about replacing it with "just give the model more text." You'd argue about replacing it with better AI. The fact that longer context windows are the competition tells you exactly what RAG is — a way to select which text to paste. ## Why This Matters: Fix the Search, Fix the AI When I built MyClaw, my RAG pipeline hallucinated because my document chunks were too large and my metadata was inconsistent. The LLM was fine. The search was broken. And the moment I fixed the chunking strategy and added proper metadata tags, the hallucination rate dropped dramatically. The same pattern repeats across the industry. Only 17% of organizations attribute more than 5% of EBIT to GenAI — despite 71% reporting regular use. The gap isn't the AI. It's the infrastructure. Clean documents, good chunking, proper metadata, accurate embeddings. That's what makes RAG work. Not a fancier model. ## The RAG Reality Checklist In a nutshell — here's how to evaluate any RAG implementation: | Component | What to Ask | Red Flag | |-----------|-------------|----------| | Retrieval | What search method? Vector, BM25, hybrid? | "We use AI search" = vague | | Chunking | How are documents split? | "Automatic" = untuned = failures | | Context budget | How many tokens for retrieved docs? | No answer = no optimization | | Fallback | What happens when retrieval finds nothing? | No answer = hallucination risk | | Freshness | How often is the index updated? | "Weekly" for daily-changing data = stale | If a vendor talks about their RAG "intelligence" but can't answer these five questions, they're selling you a search index with a nice name. ## Your Turn When was the last time a vendor showed you their RAG system's retrieval accuracy — not just its final answer quality?? And do you think the "search plus paste" framing is fair or reductive?? I'm betting that once you see the retrieval step as a search problem, debugging gets easier. You stop blaming the model and start fixing the index.