SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Model Architecture and Internals

Why can't language models reverse learned facts?

Language models trained on directional statements like "A is B" often fail to answer the reverse query. This explores why symmetric relations aren't automatically learned during training, despite appearing throughout the data.

Synthesis note · 2026-02-23 · sourced from Flaws
What do language models actually know?

If a model is trained on "Valentina Tereshkova was the first woman to travel to space," it will not automatically answer "Who was the first woman to travel to space?" Moreover, the likelihood of the correct answer is not higher than for a random name. The training encodes A→B but not B→A.

This is not a failure of logical deduction. GPT-4 given "A is B" in context can infer "B is A" perfectly well. The failure is in meta-learning during training — the model does not extract the general principle that identity is symmetric from the training data, even though the training data is full of examples where both directions occur.

The practical implications are significant. Knowledge retrieval from LLMs is directional — the model's ability to recall a fact depends on the query direction matching the training data format. This means coverage of world knowledge is systematically incomplete in a non-obvious way: the model may "know" a fact by one measure (can state A→B) but not by another (cannot retrieve A given B).

This connects to Does training data format shape reasoning strategy more than domain? — the format of how information was presented during training determines what retrieval patterns are available. The reversal curse is a specific instance: the sequential format of autoregressive training creates directional associations that don't generalize to their logical inverses.

The reversal curse also challenges the assumption that LLMs develop internal representations that abstract away from surface form. If a symmetric relation were truly represented internally, both directions would be accessible. The directional failure suggests the representation is closer to associative pattern than relational structure.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 163 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

the reversal curse — LLMs trained on A is B fail to learn B is A