SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Language, Text, and Discourse Model Architecture and Internals

Can you adapt retrieval models without accessing target data?

Explores whether dense retrieval systems can adapt to new domains using only a textual description, rather than actual target documents—especially relevant for privacy-restricted or competitive scenarios.

Synthesis note · 2026-02-22 · sourced from RAG
RAG How do you build domain expertise into general AI models? How should researchers navigate LLM reasoning research?

Dense retrieval models require labeled query-document pairs to adapt to new domains. In many enterprise contexts, the target collection is unavailable: it may not exist yet, it may be legally restricted (medical records, financial data), or sharing it with a model provider would compromise competitive advantage.

The standard assumption — you need the data to train for the domain — turns out to be false for retrieval. A brief textual description of the target domain is sufficient.

The pipeline: (1) Provide a textual domain description. (2) Use instruction-following LLMs to extract domain properties: document topics, linguistic attributes, source characteristics, terminology patterns. (3) Generate seed documents matching those properties. (4) Iteratively retrieve real-domain-like documents using the seed as query anchor. (5) Generate synthetic queries for the constructed collection. (6) Use pseudo-relevance labels to fine-tune the retrieval model.

The retrieval-augmented approach to domain understanding is key: at step (2), the domain description itself becomes a RAG query to extract structured properties, which are then used to parameterize generation at step (3). Bootstrapping from description through synthesis to training.

Evaluation on five diverse target domains shows that description-based adaptation outperforms existing dense retrieval baselines in the zero-target-access scenario. The approach enables adaptation in precisely the contexts where conventional adaptation is blocked: privacy-sensitive domains, legally restricted data, competitive scenarios.

Inquiring lines that use this note as a source 37

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 97 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

domain adaptation for retrieval is possible without target collection via description-based synthetic data