DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper · arXiv 2502.01142 · Published February 3, 2025

Large Language Models (LLMs) have shown remarkable potential in reasoning while they still suffer from severe factual hallucinations due to timeliness, accuracy, and coverage of parametric knowledge. Meanwhile, integrating reasoning with retrieval-augmented generation (RAG) remains challenging due to ineffective task decomposition and redundant retrieval, which can introduce noise and degrade response quality. In this paper, we propose DeepRAG, a framework that models retrievalaugmented reasoning as a Markov Decision Process (MDP), enabling strategic and adaptive retrieval. By iteratively decomposing queries, DeepRAG dynamically determines whether to retrieve external knowledge or rely on parametric reasoning at each step. Experiments show that DeepRAG improves retrieval efficiency while improving answer accuracy by 21.99%, demonstrating its effectiveness in optimizing retrieval-augmented reasoning.

Introduction. Large Language Models (LLMs) have demonstrated significant potential in reasoning (Plaat et al., 2024). However, limited by the capacity and capabilities of LLM, it still suffers from severe factual hallucination problems due to the timeliness, accuracy, and coverage of parametric knowledge (Zhang et al., 2023; Huang et al., 2023). Retrieval-Augmented Generation (RAG) has been proposed as a promising paradigm to address this issue by integrating relevant information from knowledge bases or search engines, thereby improving the factuality of model response (Zhao et al., 2024). However, incorporating reasoning with retrievalaugmented generation still presents several challenges. One major issue is that complex queries often require multi-step decomposition to establish a coherent reasoning process (Radhakrishnan et al., 2023). Iterative retrieval has been proposed as a solution to continuously update retrieval results to address the dynamic information needs that arise during the generation process (Yue et al., 2024).

Discussion / Conclusion. In this paper, we present DeepRAG, a simple yet effective approach that enhances LLM’s awareness of retrieval requirements through self-calibration. Our method decomposes queries into subqueries and uses binary tree search for data synthesis to help models better understand their knowledge boundaries. Experimental results across various QA tasks demonstrate that DeepRAG significantly improves the accuracy and efficiency of retrieval-augmented generation.

DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Synthesis notes that discuss concepts related to this paper