Does word frequency correlate with semantic abstraction?
Explores whether LLMs' preference for high-frequency language also pulls them toward more abstract, general meanings—and whether this shapes how they handle expert knowledge.
The companion paper "LLMs are Frequency Pattern Learners in NLI" measured WordNet hyponym-hypernym pairs (e.g., "whisper" → "talk") and found hypernyms — the more general concepts — occur more frequently than their hyponyms. Hypernym frequency exceeds hyponym frequency systematically. Combined with Adam's Law's finding that LLMs prefer high-frequency phrasing across tasks, this yields a non-obvious correlation: when an LLM prefers a higher-frequency paraphrase, it is also preferring a more abstract paraphrase. Frequency is not just a register property; it is also a generalization-gradient property.
This sharpens Does fine-tuning on NLI teach inference or amplify shortcuts?. Fine-tuning on NLI does not just amplify a frequency preference — it amplifies a preference for inferences that move from specific to general (the upward semantic-entailment direction WordNet calls generalization). The model is not learning entailment; it is learning the surface signal of generalization, which happens to correlate with entailment in the kinds of sentences NLI corpora contain.
The implication for the Knowledge Custodian frame is uncomfortable. Expert knowledge lives in the hyponyms — the specific cases, the qualifying conditions, the rare technical terms. When LLMs prefer high-frequency paraphrases at parse time, they drift up the generalization gradient: away from the specific cases that distinguish an expert from a competent generalist, and toward the abstract concepts that any reasonably literate reader could state. This is the same direction Do LLMs compress concepts more aggressively than humans do? identifies in concept representations. The compression is not random — it has a direction, and the direction is from specific toward abstract, from rare toward common, from distinctive toward median. An expert who prompts in their own register is asking the model to comprehend in a region the model is bad at; the model's "help" is to gently flatten the request back toward the register where it performs well, which is exactly the register that erases what the expert was trying to say.
Inquiring lines that use this note as a source 30
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why are education and language fluency more affected than race perception?
- What level of abstraction makes interest journeys feel personally relevant to users?
- Can meaning-level metrics like Semantic Entropy avoid length bias?
- Can statistical learning from language alone capture all aspects of cultural competence?
- What other semantic relations benefit from explicit surface markers in text?
- What semantic classifier design avoids lexical variation without genuine conceptual distinctness?
- Does RLHF training suppress exploratory and qualifying language?
- How does processing fluency bias credibility and expertise judgments?
- Does generalization frequency explain why models favor upward semantic movement?
- Does selective suppression of linguistic relations enable human meaning-making?
- Can knowledge density explain why LLM writing feels coherent but fatiguing?
- What mechanism makes keyword probability the strongest predictor of priming?
- Do metaphors work by decoupling meaning from linguistic associations?
- How much semantic meaning survives when LLMs paraphrase poetry and literary text?
- How does the Word Novelty Rate metric measure convention formation?
- Do LLMs learn linguistic generalizations or just surface-level frequency patterns?
- Does verbal step-by-step reflection preserve learning signals that abstraction removes?
- Do all semantic steering effects follow predictable patterns based on feature alignment?
- How does semantic entanglement interact with personality dimension shifts during finetuning?
- Can humans suppress frequency bias through attention and intention?
- How does epistemic stagflation change what expertise actually means?
- How does distributional shift toward rare inputs change memorization reliance?
- Why do users prefer community sources over encyclopedic references?
- Do newer LLM generations create worse detector bias through increased linguistic divergence?
- Why do cognitive metaphors change based on available technology?
- Why does semantic memory abstraction outperform raw episodic recall for personalization?
- Why does semantic diversity matter more than surface lexical diversity?
- Why do frequent words rank higher in taxonomic abstraction hierarchies?
- Does semantic diversity in output space compete with reward-component diversity?
- How does co-occurrence statistics alone produce hierarchical concept organization?
Related concepts in this collection 2
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does fine-tuning on NLI teach inference or amplify shortcuts?
When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.
frequency bias amplification has a directional gradient
-
Do LLMs compress concepts more aggressively than humans do?
Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.
same compression dynamic at representational level
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- LLMs are Frequency Pattern Learners in Natural Language Inference
- Adam's Law: Textual Frequency Law on Large Language Models
- From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
- Semantic Structure in Large Language Model Embeddings
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse
- LLM Augmentations to support Analytical Reasoning over Multiple Documents
- How new data permeates LLM knowledge and how to dilute it
- Do large language models resemble humans in language use?
Original note title
frequency tracks the generalization gradient — hypernyms outnumber hyponyms so frequent phrasing is also more abstract phrasing