SYNTHESIS NOTE
Language, Text, and Discourse

Can we measure reading efficiency as a quality metric?

How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.

Synthesis note · 2026-02-22 · sourced from Reasoning by Reflection
Where exactly do LLMs break down with language structure? How should researchers navigate LLM reasoning research?

OmniThink defines Knowledge Density as: KD = Σ(unique_atomic_knowledge_units × uniqueness_indicator) / text_length. A high-KD text delivers novel atomic facts efficiently; a low-KD text repeats and elaborates the same points across more tokens. Low-KD content produces reader fatigue and disengagement; high-KD content enables efficient knowledge transfer.

The metric addresses a gap in standard LLM text evaluation. Coherence scores (does each sentence follow from the previous?) and fluency scores (is the grammar correct?) capture structural properties that can coexist with deep redundancy. A perfectly coherent, fluent article can spend 2000 words elaborating three facts that could be stated in 400 words. KD detects this failure where coherence and fluency scores do not.

Standard LLM-generated articles score lower on KD than human-written articles for two reasons: RAG retrieves topically redundant documents (similar queries return similar content), and language models trained on maximizing next-token probability tend to elaborate and expand rather than compress and advance. Both patterns inflate text length while holding unique knowledge content constant.

The cognitive science grounding: Bovair and Kieras (1991) established that reading cost scales with total text length while value scales with unique knowledge units. KD makes this ratio explicit and measurable. Readers don't consciously compute KD, but they experience its consequences as engagement vs. fatigue.

Connects to Why do ChatGPT essays lack evaluative depth despite grammatical strength?: the evaluative dimension missing from LLM academic writing — the ability to judge when an argument has been made and move on — is precisely what KD would detect as a quality failure. Also connects to Why does AI writing sound generic despite being grammatically correct?: structural coherence (grammar) can coexist with low KD (rhetoric failure — not advancing information efficiently).

Inquiring lines that use this note as a source 5

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
15 direct connections · 142 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

knowledge density — unique atomic knowledge units per token — is a measurable quality metric for generated text that reflects the cognitive cost of reading