Can we measure reading efficiency as a quality metric?

How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.

Synthesis note · 2026-02-22 · sourced from Reasoning by Reflection

OmniThink defines Knowledge Density as: KD = Σ(unique_atomic_knowledge_units × uniqueness_indicator) / text_length. A high-KD text delivers novel atomic facts efficiently; a low-KD text repeats and elaborates the same points across more tokens. Low-KD content produces reader fatigue and disengagement; high-KD content enables efficient knowledge transfer.

The metric addresses a gap in standard LLM text evaluation. Coherence scores (does each sentence follow from the previous?) and fluency scores (is the grammar correct?) capture structural properties that can coexist with deep redundancy. A perfectly coherent, fluent article can spend 2000 words elaborating three facts that could be stated in 400 words. KD detects this failure where coherence and fluency scores do not.

Standard LLM-generated articles score lower on KD than human-written articles for two reasons: RAG retrieves topically redundant documents (similar queries return similar content), and language models trained on maximizing next-token probability tend to elaborate and expand rather than compress and advance. Both patterns inflate text length while holding unique knowledge content constant.

The cognitive science grounding: Bovair and Kieras (1991) established that reading cost scales with total text length while value scales with unique knowledge units. KD makes this ratio explicit and measurable. Readers don't consciously compute KD, but they experience its consequences as engagement vs. fatigue.

Connects to Why do ChatGPT essays lack evaluative depth despite grammatical strength?: the evaluative dimension missing from LLM academic writing — the ability to judge when an argument has been made and move on — is precisely what KD would detect as a quality failure. Also connects to Why does AI writing sound generic despite being grammatically correct?: structural coherence (grammar) can coexist with low KD (rhetoric failure — not advancing information efficiently).

Inquiring lines that read this note 5

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

Why does production time matter to the meaning of generated text?

What role does compression play in language model capability and generalization?

Why does statistical compression destroy literary connotation and meaning?

Does AI text rewriting systematically distort writer intent and preference?

How do demographic and emotional compression relate to writing quality?

Why do readers trust citations and complexity regardless of accuracy?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 144 in 2-hop network ·dense cluster Open in graph ↗

Can we measure reading efficiency as a quality m… Why do ChatGPT essays lack evaluative depth despit… Why does AI writing sound generic despite being gr… Can human judges detect measurable differences in … Why does vanilla RAG produce shallow and redundant… Do LLMs compress concepts more aggressively than h…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why do ChatGPT essays lack evaluative depth despite grammatical strength? ChatGPT writes grammatically coherent academic prose but uses fewer evaluative and evidential nouns than student writers. The question explores whether this rhetorical gap—favoring description over argument—reflects a fundamental limitation in how LLMs approach academic writing.
KD operationalizes the missing dimension: ratio of novel information to total content; low KD is the measurable instance of evaluative absence
Why does AI writing sound generic despite being grammatically correct? Explores whether the robotic quality of AI text stems from grammatical failures or rhetorical ones. Understanding this distinction matters for diagnosing what AI systems actually struggle with in human-like writing.
fluency ≠ informational density; KD is the metric that captures the rhetoric side of the gap
Can human judges detect measurable differences in AI text? Research shows LLM text differs statistically across six lexical dimensions, but human readers—even experts—cannot reliably identify which texts are AI-generated. Why does measurement succeed where human perception fails?
complementary measurement: lexical diversity tracks vocabulary variety across six dimensions; KD tracks information density per token; both reveal measurable human-AI gaps invisible to surface evaluation
Why does vanilla RAG produce shallow and redundant results? Standard RAG systems get stuck in a single semantic neighborhood because their initial query determines what documents are discoverable. The question asks whether fixed retrieval strategies fundamentally limit knowledge depth compared to iterative exploration.
application: KD metric was developed to diagnose RAG's redundancy problem; this note shows the systemic cause of low-KD RAG output
Do LLMs compress concepts more aggressively than humans do? Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.
compression explains mechanism behind low KD: aggressive statistical compression eliminates the nuanced distinctions that create unique atomic knowledge units

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

knowledge density — unique atomic knowledge units per token — is a measurable quality metric for generated text that reflects the cognitive cost of reading

Can we measure reading efficiency as a quality metric?

Inquiring lines that read this note 5

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4