The Dirty Secret of Enterprise RAG
Retrieval-Augmented Generation has become the default pattern for enterprise AI. Company has proprietary data, company wants AI to use that data, company implements RAG. Simple in concept. Agonizing in execution.
We have audited RAG implementations at over twenty organizations in the past year. The median implementation retrieves irrelevant documents 30 to 40% of the time, provides no mechanism for users to verify sources, and has never been systematically evaluated against a quality benchmark.
These systems are in production. Employees are making decisions based on their outputs. Nobody is measuring how often those outputs are wrong.
Where RAG Implementations Fail
- Chunking strategy is an afterthought. Most teams split documents into fixed-size chunks without considering document structure, semantic coherence, or retrieval requirements. A 512-token chunk that splits a table in half is worse than useless. It is actively misleading.
- Embedding models are chosen by default, not by evaluation. Teams use whatever embedding model the tutorial recommended without benchmarking it on their actual data. Domain-specific content (legal, medical, financial) often requires specialized embeddings or at minimum careful evaluation of general-purpose ones.
- Retrieval is confused with search. Most RAG implementations do naive vector similarity search. This misses relevant documents that use different terminology, fails on queries that require reasoning about document relationships, and returns duplicative results from similar documents.
- The generation step ignores retrieval quality. The model receives retrieved documents and generates an answer, with no mechanism to assess whether the retrieved documents actually contain the information needed. Garbage in, confident garbage out.
What Good RAG Looks Like
Intelligent chunking. Respect document structure. Use semantic boundaries. Overlap chunks to avoid information loss at boundaries. Index metadata alongside content so retrieval can filter by document type, date, author, or other attributes.
Hybrid retrieval. Combine vector search with keyword search. Use re-ranking models to improve precision. Implement query expansion to catch relevant documents that use different terminology.
Retrieval evaluation. Measure retrieval quality independently from generation quality. Track retrieval precision and recall on a test set. If you cannot find the right documents, no amount of prompt engineering will fix the output.
RAG is not a solved problem. It is a system design challenge that requires the same rigor as any production data pipeline. Treating it as a tutorial exercise produces tutorial-quality results.
The Fix
Build a retrieval evaluation dataset: 100 questions with known source documents. Measure your current retrieval accuracy. The number will probably horrify you. That horror is the beginning of improvement.