SYNTHESIS NOTE
Language, Text, and Discourse Reasoning, Retrieval, and Evaluation Model Architecture and Internals

Does word frequency correlate with semantic abstraction?

Explores whether LLMs' preference for high-frequency language also pulls them toward more abstract, general meanings—and whether this shapes how they handle expert knowledge.

Synthesis note · 2026-05-02 · sourced from Natural Language Inference
Why do LLMs fail at understanding what remains unsaid? How do language models learn to think like humans?

The companion paper "LLMs are Frequency Pattern Learners in NLI" measured WordNet hyponym-hypernym pairs (e.g., "whisper" → "talk") and found hypernyms — the more general concepts — occur more frequently than their hyponyms. Hypernym frequency exceeds hyponym frequency systematically. Combined with Adam's Law's finding that LLMs prefer high-frequency phrasing across tasks, this yields a non-obvious correlation: when an LLM prefers a higher-frequency paraphrase, it is also preferring a more abstract paraphrase. Frequency is not just a register property; it is also a generalization-gradient property.

This sharpens Does fine-tuning on NLI teach inference or amplify shortcuts?. Fine-tuning on NLI does not just amplify a frequency preference — it amplifies a preference for inferences that move from specific to general (the upward semantic-entailment direction WordNet calls generalization). The model is not learning entailment; it is learning the surface signal of generalization, which happens to correlate with entailment in the kinds of sentences NLI corpora contain.

The implication for the Knowledge Custodian frame is uncomfortable. Expert knowledge lives in the hyponyms — the specific cases, the qualifying conditions, the rare technical terms. When LLMs prefer high-frequency paraphrases at parse time, they drift up the generalization gradient: away from the specific cases that distinguish an expert from a competent generalist, and toward the abstract concepts that any reasonably literate reader could state. This is the same direction Do LLMs compress concepts more aggressively than humans do? identifies in concept representations. The compression is not random — it has a direction, and the direction is from specific toward abstract, from rare toward common, from distinctive toward median. An expert who prompts in their own register is asking the model to comprehend in a region the model is bad at; the model's "help" is to gently flatten the request back toward the register where it performs well, which is exactly the register that erases what the expert was trying to say.

Inquiring lines that use this note as a source 30

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 114 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

frequency tracks the generalization gradient — hypernyms outnumber hyponyms so frequent phrasing is also more abstract phrasing