Can we measure reading efficiency as a quality metric?
How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.
OmniThink defines Knowledge Density as: KD = Σ(unique_atomic_knowledge_units × uniqueness_indicator) / text_length. A high-KD text delivers novel atomic facts efficiently; a low-KD text repeats and elaborates the same points across more tokens. Low-KD content produces reader fatigue and disengagement; high-KD content enables efficient knowledge transfer.
The metric addresses a gap in standard LLM text evaluation. Coherence scores (does each sentence follow from the previous?) and fluency scores (is the grammar correct?) capture structural properties that can coexist with deep redundancy. A perfectly coherent, fluent article can spend 2000 words elaborating three facts that could be stated in 400 words. KD detects this failure where coherence and fluency scores do not.
Standard LLM-generated articles score lower on KD than human-written articles for two reasons: RAG retrieves topically redundant documents (similar queries return similar content), and language models trained on maximizing next-token probability tend to elaborate and expand rather than compress and advance. Both patterns inflate text length while holding unique knowledge content constant.
The cognitive science grounding: Bovair and Kieras (1991) established that reading cost scales with total text length while value scales with unique knowledge units. KD makes this ratio explicit and measurable. Readers don't consciously compute KD, but they experience its consequences as engagement vs. fatigue.
Connects to Why do ChatGPT essays lack evaluative depth despite grammatical strength?: the evaluative dimension missing from LLM academic writing — the ability to judge when an argument has been made and move on — is precisely what KD would detect as a quality failure. Also connects to Why does AI writing sound generic despite being grammatically correct?: structural coherence (grammar) can coexist with low KD (rhetoric failure — not advancing information efficiently).
Inquiring lines that use this note as a source 5
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Why does production time matter to the meaning of generated text?
- Why does statistical compression destroy literary connotation and meaning?
- How do demographic and emotional compression relate to writing quality?
- What makes evaluative sophistication measurable in academic writing quality?
- Does high knowledge density in text reduce user motivation to read more?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do ChatGPT essays lack evaluative depth despite grammatical strength?
ChatGPT writes grammatically coherent academic prose but uses fewer evaluative and evidential nouns than student writers. The question explores whether this rhetorical gap—favoring description over argument—reflects a fundamental limitation in how LLMs approach academic writing.
KD operationalizes the missing dimension: ratio of novel information to total content; low KD is the measurable instance of evaluative absence
-
Why does AI writing sound generic despite being grammatically correct?
Explores whether the robotic quality of AI text stems from grammatical failures or rhetorical ones. Understanding this distinction matters for diagnosing what AI systems actually struggle with in human-like writing.
fluency ≠ informational density; KD is the metric that captures the rhetoric side of the gap
-
Can human judges detect measurable differences in AI text?
Research shows LLM text differs statistically across six lexical dimensions, but human readers—even experts—cannot reliably identify which texts are AI-generated. Why does measurement succeed where human perception fails?
complementary measurement: lexical diversity tracks vocabulary variety across six dimensions; KD tracks information density per token; both reveal measurable human-AI gaps invisible to surface evaluation
-
Why does vanilla RAG produce shallow and redundant results?
Standard RAG systems get stuck in a single semantic neighborhood because their initial query determines what documents are discoverable. The question asks whether fixed retrieval strategies fundamentally limit knowledge depth compared to iterative exploration.
application: KD metric was developed to diagnose RAG's redundancy problem; this note shows the systemic cause of low-KD RAG output
-
Do LLMs compress concepts more aggressively than humans do?
Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.
compression explains mechanism behind low KD: aggressive statistical compression eliminates the nuanced distinctions that create unique atomic knowledge units
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
- From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
- How new data permeates LLM knowledge and how to dilute it
- Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
- Can Large Language Models do Analytical Reasoning?
- Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments
- Adam's Law: Textual Frequency Law on Large Language Models
- Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?
Original note title
knowledge density — unique atomic knowledge units per token — is a measurable quality metric for generated text that reflects the cognitive cost of reading