What makes expert writing harder to learn from than surface text alone?
This explores why the polished text experts produce is a poor teacher on its own — the corpus's answer is that finished writing hides the thinking that made it, and what's missing is exactly what's worth learning.
This explores why the polished text experts produce is a poor teacher on its own. The sharpest framing in the corpus is that a finished expert text is the *surface residue* of a hidden process — the self-talk, the recall of prior knowledge, the false starts, the verification steps that produced it have all been compressed out before the words hit the page Can reconstructing expert thinking improve reasoning transfer?. What you read is the compressed output; what made an expert an expert is the decompression. So learning from surface text alone means trying to reverse-engineer reasoning from its end product.
This turns out to explain a puzzle that looks unrelated: why language models need vastly more text than humans to learn. Humans, the corpus argues, instinctively decompress writing back into the reasoning that must have generated it — they read between the lines. Models train only on the lines themselves, never the inferred thought, and that gap is a major source of their data inefficiency. The proposed fix is telling: train models to jointly learn the text *and* a reconstructed layer of latent thought beneath it Why do language models need so much more text than humans?. And when expert texts are augmented with reconstructed thinking during pretraining, the payoff is not just better recall but reasoning that transfers across domains and scales its depth to how hard a problem is — outperforming standard training by up to 8 points on the hardest problems Can reconstructing expert thinking improve reasoning transfer?. The hidden process, it turns out, was the transferable part all along.
The corpus sharpens what specifically goes missing in the surface by looking at where AI writing falls short. LLMs have mastered grammar and organization but avoid *evaluative stance-taking* — they lean on neutral "manner" nouns (method, approach) and shy away from the "status" and "evidential" nouns (claim, evidence) that carry a judgment about what matters and why Why does AI writing sound generic despite being grammatically correct?, Why do ChatGPT essays lack evaluative depth despite grammatical strength?. That's a useful mirror: the evaluative weight, the sense of a writer deciding what's important, is precisely the trace of hidden judgment that a surface text under-records and a learner most needs to absorb.
There's a structural echo here too. Expert prose also tends to point forward — previewing arguments it's about to make — while machine-generated text defaults to summarizing what was already said Does ChatGPT organize text differently than human writers?. Forward-pointing structure is the visible shadow of a plan the writer holds but never states; the reader has to infer the plan to follow it. Human writing even carries an internal appeal to the reader's attention — an implicit "stay with me, here's why this matters" — that finished text performs without spelling out Does AI writing lack the internal appeal to attention that humans use?. These are all things the surface enacts rather than explains.
The quiet lesson across these notes: the gap between reading expert writing and learning from it is the gap between an artifact and the process that made it. The most valuable content — the reasoning, the judgments of significance, the rhetorical plan — is the part that got compressed away precisely because the expert no longer needed to say it out loud. If you want to go deeper, the BoLT line of work on training models to reconstruct that latent thought is the corpus's most direct attack on the problem Why do language models need so much more text than humans?.
Sources 6 notes
Training on expert texts augmented with reconstructed thought processes (self-talk, knowledge recall, verification) produces reasoning skills that transfer across domains and adapt depth to problem difficulty, outperforming standard continual pretraining by up to 8 points on hard problems.
Human text is compressed thought; humans learn by decompressing it back to inferred reasoning, while LMs train only on the surface. This gap explains data inefficiency and can be addressed by training models to jointly learn text and reconstructed latent thoughts via BoLT.
AI text uses manner nouns and anaphoric references that are descriptively neutral, while human writers use status and evidential nouns that carry evaluative weight. This produces organizationally coherent but argumentatively inert prose.
Analysis of 145 ChatGPT and 145 student essays revealed LLMs favor manner nouns (method, approach) while avoiding status and evidential nouns (claim, evidence). This systematic preference for description over evaluative stance-taking explains perceived vagueness without invoking vocabulary or grammatical deficits.
ChatGPT defaults to summarizing what was already said, while students use more forward-pointing structure that previews upcoming arguments. This reflects different reader models and may stem from how autoregressive generation works token by token.
Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.