Summary

A framing of how to diagnose production hallucinations in a RAG system by separating retriever failures from generator failures. The core approach is to isolate each component with controlled tests rather than treating the system as a black box.

從 RAG 系統中分離檢索器故障和生成器故障來診斷生產環境幻覺的框架。核心做法是用受控測試隔離每個元件,而非將系統視為黑盒子。

Key Points

  • Root cause isolation: treat retriever and generator as independent components, test each with oracle inputs (known-good context, known-good queries) to identify which is failing
  • Retriever failure signals: relevant documents not retrieved, wrong ranking, context window stuffed with low-relevance chunks
  • Generator failure signals: correct documents retrieved but model ignores them, makes up facts that contradict the retrieved context, or over-generalizes from retrieved fragments
  • The question “is the retriever or generator broken” is the first diagnostic split — everything else follows from that answer

Insights

The key diagnostic move is giving the generator perfect context (oracle retrieval) and checking if it still hallucinates. If yes, the generator has a faithfulness problem independent of retrieval quality. If no, the problem is entirely in the retriever. Most production debugging skips this step and optimizes both components simultaneously, which makes it impossible to know if improvements are working.

Connections

Raw Excerpt

Your RAG system is hallucinating in production. How do you diagnose what’s broken — the retriever or the generator?