Why do queries and documents occupy different embedding spaces?

Queries and documents express the same information in fundamentally different ways—short and interrogative versus long and declarative. Understanding this mismatch is crucial for why direct embedding retrieval often fails.

Synthesis note · 2026-02-22 · sourced from RAG

The standard embedding retrieval pipeline maps a query directly to a vector and finds nearby document vectors. This assumes that a query and a relevant document occupy nearby regions of the embedding space. They often do not. Queries are short, telegraphic, and interrogative. Relevant documents are long, detailed, and declarative. The same information expressed in query form and document form looks different to an encoder trained on natural language co-occurrence.

HyDE (Hypothetical Document Embeddings) decomposes retrieval into two steps that exploit this asymmetry. First: ask an instruction-following LLM to generate a hypothetical document that would answer the query — not a real document, but something that looks like one. Second: embed the hypothetical document and use document-document similarity to find real corpus matches. The encoder, trained on documents-to-documents, now operates in its natural space.

The generated document may be factually wrong — it is, in the FLARE framing, a hallucination on purpose. But factual accuracy is not the goal. Relevance pattern is the goal. The hypothetical document "captures relevance by example": it demonstrates what a relevant document looks like in terms of style, terminology, and structure. The encoder's dense bottleneck filters out hallucinated details while preserving the embedding signature of relevant content.

The implication is that the query is the wrong level of abstraction for retrieval. Queries work well when they are complete enough to uniquely identify relevant content — which is why they succeed on short-form factoid QA but fail on complex or underspecified queries. Hypothetical documents circumvent this by translating the query into the same genre as the targets.

The approach requires no relevance labels and no retrieval-specific fine-tuning — only an instruction-following LLM and an unsupervised contrastive encoder. On 11 query sets spanning web search, question answering, and fact verification, HyDE with InstructGPT and Contriever significantly outperforms the zero-shot no-relevance baseline.

Inquiring lines that read this note 5

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How should retrieval systems optimize for multi-step reasoning during inference?

Do Doc2Query approaches suffer from the same misaligned-target problem?

Why do semantic similarity and task relevance diverge in vector embeddings?

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 161 in 2-hop network ·dense cluster Open in graph ↗

Why do queries and documents occupy different em… Do language models actually build shared understan… Can prompt optimization teach models knowledge the…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do language models actually build shared understanding in conversation? When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the grounding gap in dialogue; HyDE is an example of building common ground in retrieval by generating an intermediate representation
Can prompt optimization teach models knowledge they lack? Explores whether sophisticated prompting techniques can inject new domain knowledge into language models, or if they're limited to activating existing training knowledge.
HyDE works because the LLM already has enough knowledge to write a plausible answer; the generation activates a latent representation useful for retrieval

Why do queries and documents occupy different embedding spaces?

Inquiring lines that read this note 5

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4