Do hierarchical retrieval architectures outperform flat ones on complex queries?
Explores whether separating query planning from answer synthesis into distinct architectural components improves performance on multi-hop retrieval tasks compared to unified single-pass approaches.
HierSearch separates two functions that flat retrieval architectures conflate: deciding what to search for (query planning) and deciding what the answer is (answer synthesis). The finding is that these functions interfere with each other when combined, and separating them improves multi-hop query performance.
The interference mechanism: in a flat architecture, the model must simultaneously track what it is looking for, what it has found, and how the findings combine into an answer. Multi-hop queries require multiple retrieval rounds with intermediate synthesis steps — each round's findings must inform the next round's query while also contributing to the final answer. When one model component handles all of this, it loses coherence across the chain. The hierarchical architecture assigns query planning to one component and answer synthesis to another, letting each specialize.
This has implications beyond deep research. The same interference between planning and execution is well-documented in agent design: models that plan and execute simultaneously produce worse plans and worse execution than models where these are separated. HierSearch is the retrieval-specific confirmation of a general architectural principle.
The structural finding also has a connection to How do readers track segments, purposes, and salience together? — that is the cognitive architecture problem HierSearch solves at the system level. The discourse-level problem (tracking segments + purposes + salient objects in parallel) is equivalent to the retrieval-level problem (tracking query intent + retrieved evidence + synthesis state in parallel). Architecturally separating these reduces the tracking burden.
LogicRAG extends the hierarchical principle by making the query planning step structurally explicit: it decomposes the query into a directed acyclic graph (DAG) of subproblems at inference time, then resolves them in topological order. Where HierSearch separates planning from synthesis at the system level, LogicRAG implements the planning step as a structured dependency graph at the query level. The result: query-adaptive logic structures without corpus pre-processing cost. See Can query-time graph construction replace pre-built knowledge graphs?.
Inquiring lines that use this note as a source 88
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- What makes query complexity a better routing signal than response quality?
- How does ranking-aligned summarization compare to aspect-controlled generation methods?
- Can retrieval improve multi-step reasoning by triggering at each uncertainty?
- Can task-aware ranking replace similarity scoring in other RAG systems?
- What makes reranking during retrieval better than catching failures at plan time?
- How should we allocate compute between reasoning and retrieval iterations?
- Does parallel retrieval outperform sequential search chains at test time?
- What makes proactive tool retrieval better than single-round semantic matching?
- How does semantic search over research papers guide autonomous architecture proposals?
- What mathematical limits constrain embedding-based retrieval systems?
- Can embedding-based retrieval alone solve the causal relevance problem?
- How does structure-aware retrieval routing differ from existing graph-versus-vector RAG tradeoffs?
- How does hierarchical query planning versus flat prompting affect multi-source retrieval?
- Why does selective context retrieval outperform including all historical information?
- Do single-step retrieval systems with sophisticated synthesis qualify as deep research?
- Why do hierarchical architectures better implement the deep research definition?
- How do real search queries reveal what counts as a deep research question?
- How does retrieval-augmented generation extract structured properties from domain descriptions?
- What makes web retrieval more effective than static knowledge bases?
- What makes retrieval augmentation more effective than simply increasing embedding size?
- How should query augmentation strategies be properly evaluated against baselines?
- How do hierarchical architectures separate planning from retrieval differently than flat ones?
- Why does community detection in knowledge graphs outperform pure retrieval or pure summarization?
- Can hierarchical entity extraction from books enable both textual and visual reasoning?
- How do community-based summaries differ from retrieval-based traversal in knowledge graph RAG?
- What makes hierarchical community summaries useful for exploration without a specific question?
- How does map-reduce over communities compare to flat multi-hop retrieval architectures?
- How does query planning as a separate step improve multi-hop retrieval coherence?
- How does test-time search budget efficiency benefit from hierarchical architectures?
- Can inference-time query decomposition replace pre-built knowledge graph structures?
- How does hypergraph accumulation differ from single-pass graph retrieval?
- How do hierarchical query planning architectures improve multi-hop retrieval?
- When should relational graph traversal replace vector embedding retrieval?
- Which knowledge structure types best fit different query types?
- Can step-level rewards improve training of agentic retrieval systems?
- What documents improve answers beyond surface query similarity?
- How does retrieval-augmented generation create topically redundant content patterns?
- Why does Personalized PageRank naturally discover concepts multiple hops from query seeds?
- Can query-time logic graphs match the efficiency of pre-built knowledge graph indexing?
- Should production CRS systems combine multiple retrieval strategies in a hybrid approach?
- Can explicit linkers replace vector similarity for multi-step question answering?
- Can parallel retrieval chains avoid the context consumption problem?
- How do hierarchical knowledge graphs solve similar multimodal retrieval problems in books?
- How do cascaded probabilistic models compare to reinforcement learning for per-query system design?
- Does the parallel versus sequential trade-off appear in retrieval-augmented generation systems?
- Why does single-round retrieval fail on multi-step tasks across different domains?
- Can hierarchical vector routing reduce context overhead while maintaining tool coverage?
- When do queries fail to capture relevance patterns effectively?
- Can graph-based retrieval with knowledge graphs scale to multi-hop reasoning?
- Why do question types determine retrieval and decomposition strategy in QA?
- What limits exist on retrieval budget during inference?
- How can knowledge graphs improve over pure embedding retrieval?
- Can retrieval strategies drive both draft refinement and new research question generation?
- How do retrieval heads interact with layer-level separation of knowledge and reasoning?
- Can generator feedback backpropagate through the entire retrieval pipeline?
- How do taxonomy-based retrieval scaffolds improve model performance at inference time?
- Can hierarchical key point structures improve opinion summarization?
- Can knowledge graph structure be exploited for efficient multi-hop retrieval?
- How does proactive information-gathering capability differ from passive knowledge retrieval?
- Can tree search improve question generation the way it improves reasoning?
- How does reflection-based query refinement differ from single-pass retrieval strategies?
- Do expansion-reflection loops and chain-of-retrieval approaches solve the same problem?
- Can re-ranking and advanced chunking fix embedding retrieval failures?
- Do graph databases outperform embeddings for relational retrieval tasks?
- How does description-based bridging compare to affordance-aware reranking for retrieval?
- Can embedding-cluster routing outperform a single frontier model?
- How do parallel and sequential retrieval strategies compare in compute efficiency?
- Can separating token weighting from query filtering reduce reward hacking?
- How do hierarchical architectures improve multi-hop query performance?
- Why do deep research agents outperform retrieval augmented generation systems?
- How should retrieval and verification tasks be separated architecturally?
- Can knowledge graphs built at inference time outperform pre-built retrieval augmented generation?
- Can single-hop knowledge automatically compose into multi-hop capability?
- Does uncertainty trigger retrieval better than fixed-interval tool calls?
- What distinguishes iterative query refinement from pure self-revision loops?
- Can stateless multi-step retrieval capture evidence integration as well as dynamic memory?
- What makes graph databases better than embeddings for relational queries?
- Can adaptive per-step decisions outperform uniform retrieval policies across different reasoning tasks?
- What role does document reranking play alongside decisions about whether to retrieve?
- How do hierarchical research architectures improve multi-hop query accuracy?
- Can sparse attention methods be designed specifically for multi-hop reasoning tasks?
- Why do aggregation tasks degrade faster than multi-hop reasoning under sparsity?
- How should retrieval systems handle multi-hop reasoning and iterative information needs?
- What makes hierarchical reasoning effective for taxonomy induction?
- Can a single recursive network replace hierarchical dual-network architectures?
- Does retrieval quality depend more on access structure or write gating?
- What would instruction-following retrieval enable that query-only systems cannot?
- How does temporal grounding in retrieval compare to architectural approaches?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does search budget scale like reasoning tokens for answer quality?
Explores whether the test-time scaling law that applies to reasoning tokens also governs search-based retrieval in agentic systems. Understanding this relationship could reshape how we allocate inference compute between thinking and searching.
extends: hierarchical architecture makes the search budget more efficient by reducing interference loss
-
How do readers track segments, purposes, and salience together?
Can discourse processing actually happen in parallel rather than sequentially? This matters because understanding how readers coordinate multiple layers of meaning at once reveals where AI systems break down in comprehension.
connects: HierSearch solves at system architecture level the same parallel-tracking problem that discourse processing requires at the cognitive level
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Chain-of-Retrieval Augmented Generation
- RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
- You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
- Deep Research: A Systematic Survey
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
- Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
- MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Original note title
hierarchical research architectures that separate query planning from answer synthesis outperform flat architectures on multi-hop queries