SYNTHESIS NOTE

Why can't language models reverse learned facts?

Language models trained on directional statements like "A is B" often fail to answer the reverse query. This explores why symmetric relations aren't automatically learned during training, despite appearing throughout the data.

Synthesis note · 2026-02-23 · sourced from Flaws

If a model is trained on "Valentina Tereshkova was the first woman to travel to space," it will not automatically answer "Who was the first woman to travel to space?" Moreover, the likelihood of the correct answer is not higher than for a random name. The training encodes A→B but not B→A.

This is not a failure of logical deduction. GPT-4 given "A is B" in context can infer "B is A" perfectly well. The failure is in meta-learning during training — the model does not extract the general principle that identity is symmetric from the training data, even though the training data is full of examples where both directions occur.

The practical implications are significant. Knowledge retrieval from LLMs is directional — the model's ability to recall a fact depends on the query direction matching the training data format. This means coverage of world knowledge is systematically incomplete in a non-obvious way: the model may "know" a fact by one measure (can state A→B) but not by another (cannot retrieve A given B).

This connects to Does training data format shape reasoning strategy more than domain? — the format of how information was presented during training determines what retrieval patterns are available. The reversal curse is a specific instance: the sequential format of autoregressive training creates directional associations that don't generalize to their logical inverses.

The reversal curse also challenges the assumption that LLMs develop internal representations that abstract away from surface form. If a symmetric relation were truly represented internally, both directions would be accessible. The directional failure suggests the representation is closer to associative pattern than relational structure.

Inquiring lines that read this note 3

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How do training priors constrain what context information can override?

Can neural networks learn that A implies B in reverse?

Why do reasoning models fail at systematic problem-solving and search?

Why do language models struggle with backward reasoning compared to forward?

Why do semantic similarity and task relevance diverge in vector embeddings?

Why do single vectors fail at capturing negation and word order?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 161 in 2-hop network ·dense cluster Open in graph ↗

Why can't language models reverse learned facts? Does training data format shape reasoning strategy… Why do LLMs handle causal reasoning better than te… Do large language models reason symbolically or se…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Does training data format shape reasoning strategy more than domain? What explains why models trained on multiple-choice data reason differently than those trained on free-form text? The research isolates format and domain effects to measure which one matters more.
training format determines retrieval patterns; the reversal curse is a specific directional failure of format-bound learning
Why do LLMs handle causal reasoning better than temporal reasoning? Exploring whether language models perform asymmetrically on different discourse relations and what training data patterns might explain the gap between causal and temporal reasoning abilities.
another case where training data distribution shapes which reasoning directions succeed
Do large language models reason symbolically or semantically? Can LLMs follow explicit logical rules when those rules contradict their training knowledge? Testing whether reasoning operates independently of semantic associations reveals what computational mechanisms actually drive LLM multi-step inference.
the reversal curse is consistent: symbolic reasoning (symmetry of identity) is not learned; only the semantic association in one direction

Why can't language models reverse learned facts?

Inquiring lines that read this note 3

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4