INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›How do surface signals and framing…›Why do readers trust citations and…›this inquiring line

Experts' polished writing hides the messy thinking that made them experts — the deleted process is the real lesson.

What makes expert writing harder to learn from than surface text alone?

This explores why the polished text experts produce is a poor teacher on its own — the corpus's answer is that finished writing hides the thinking that made it, and what's missing is exactly what's worth learning.

This explores why the polished text experts produce is a poor teacher on its own. The sharpest framing in the corpus is that a finished expert text is the *surface residue* of a hidden process — the self-talk, the recall of prior knowledge, the false starts, the verification steps that produced it have all been compressed out before the words hit the page Can reconstructing expert thinking improve reasoning transfer?. What you read is the compressed output; what made an expert an expert is the decompression. So learning from surface text alone means trying to reverse-engineer reasoning from its end product.

This turns out to explain a puzzle that looks unrelated: why language models need vastly more text than humans to learn. Humans, the corpus argues, instinctively decompress writing back into the reasoning that must have generated it — they read between the lines. Models train only on the lines themselves, never the inferred thought, and that gap is a major source of their data inefficiency. The proposed fix is telling: train models to jointly learn the text *and* a reconstructed layer of latent thought beneath it Why do language models need so much more text than humans?. And when expert texts are augmented with reconstructed thinking during pretraining, the payoff is not just better recall but reasoning that transfers across domains and scales its depth to how hard a problem is — outperforming standard training by up to 8 points on the hardest problems Can reconstructing expert thinking improve reasoning transfer?. The hidden process, it turns out, was the transferable part all along.

The corpus sharpens what specifically goes missing in the surface by looking at where AI writing falls short. LLMs have mastered grammar and organization but avoid *evaluative stance-taking* — they lean on neutral "manner" nouns (method, approach) and shy away from the "status" and "evidential" nouns (claim, evidence) that carry a judgment about what matters and why Why does AI writing sound generic despite being grammatically correct?, Why do ChatGPT essays lack evaluative depth despite grammatical strength?. That's a useful mirror: the evaluative weight, the sense of a writer deciding what's important, is precisely the trace of hidden judgment that a surface text under-records and a learner most needs to absorb.

There's a structural echo here too. Expert prose also tends to point forward — previewing arguments it's about to make — while machine-generated text defaults to summarizing what was already said Does ChatGPT organize text differently than human writers?. Forward-pointing structure is the visible shadow of a plan the writer holds but never states; the reader has to infer the plan to follow it. Human writing even carries an internal appeal to the reader's attention — an implicit "stay with me, here's why this matters" — that finished text performs without spelling out Does AI writing lack the internal appeal to attention that humans use?. These are all things the surface enacts rather than explains.

The quiet lesson across these notes: the gap between reading expert writing and learning from it is the gap between an artifact and the process that made it. The most valuable content — the reasoning, the judgments of significance, the rhetorical plan — is the part that got compressed away precisely because the expert no longer needed to say it out loud. If you want to go deeper, the BoLT line of work on training models to reconstruct that latent thought is the corpus's most direct attack on the problem Why do language models need so much more text than humans?.

Sources 6 notes

Can reconstructing expert thinking improve reasoning transfer?

Training on expert texts augmented with reconstructed thought processes (self-talk, knowledge recall, verification) produces reasoning skills that transfer across domains and adapt depth to problem difficulty, outperforming standard continual pretraining by up to 8 points on hard problems.

Why do language models need so much more text than humans?

Human text is compressed thought; humans learn by decompressing it back to inferred reasoning, while LMs train only on the surface. This gap explains data inefficiency and can be addressed by training models to jointly learn text and reconstructed latent thoughts via BoLT.

Why does AI writing sound generic despite being grammatically correct?

AI text uses manner nouns and anaphoric references that are descriptively neutral, while human writers use status and evidential nouns that carry evaluative weight. This produces organizationally coherent but argumentatively inert prose.

Why do ChatGPT essays lack evaluative depth despite grammatical strength?

Analysis of 145 ChatGPT and 145 student essays revealed LLMs favor manner nouns (method, approach) while avoiding status and evidential nouns (claim, evidence). This systematic preference for description over evaluative stance-taking explains perceived vagueness without invoking vocabulary or grammatical deficits.

Does ChatGPT organize text differently than human writers?

ChatGPT defaults to summarizing what was already said, while students use more forward-pointing structure that previews upcoming arguments. This reflects different reader models and may stem from how autoregressive generation works token by token.

Show all 6 sources

Does AI writing lack the internal appeal to attention that humans use?

Human writing contains an appeal to the reader's attention as a fundamental property of communication itself. AI-generated posts inherit platform visibility but do not perform this internal appeal, producing the reported aloofness readers perceive — a structural absence, not a stylistic defect.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst testing whether the constraint that 'expert writing is hard to learn from because it compresses hidden reasoning' still holds or has been relaxed by newer models, methods, or training regimes.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; note these are perishable snapshots:

• Expert texts are surface residues of hidden thought; humans decompress them via inference, but LLMs train only on surface tokens, creating a data-inefficiency gap (2025).
• Reconstructing latent thought during pretraining improves reasoning transfer and scales depth to problem difficulty — up to 8-point gains on hard problems (2025).
• LLMs avoid evaluative stance-taking (claim, evidence, status nouns) in favor of neutral manner nouns (method, approach), losing the writer's judgment of significance (2024–2025).
• Human writers prefer cataphoric (forward-pointing) structure that previews arguments; LLMs default to anaphoric (backward-summarizing) structure, obscuring the writer's plan (2024).
• Human social-media writing makes internal appeals to audience attention that models do not (2024).

Anchor papers (verify; mind their dates):
- arXiv:2503.18866 (Reasoning to Learn from Latent Thoughts, 2025)
- arXiv:2505.10182 (Mining Hidden Thoughts from Texts, 2025)
- arXiv:2503.19260 (Linguistic Blind Spots of Large Language Models, 2025)
- arXiv:2407.01119 (Pron vs Prompt, 2024)

Your task:

(1) RE-TEST EACH CONSTRAINT. For every claim above, judge whether newer model scales (o1, o3 reasoning models, or post-2026 checkpoints), augmented training (synthetic latent-thought data, process supervision, outcome supervision hybrid), new evaluation frameworks (fine-grained stance detection, plan inference metrics), or agent orchestration (chain-of-thought harnesses, tree search, reflection loops) have since relaxed or overturned it. Separate the durable question ("Why is compression a learning challenge?") from perishable limitations ("Current models cannot recover reasoning"). Cite what resolved it; flag what still holds.

(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Look for evidence that LLMs *already* recover evaluative stance, cataphoric structure, or audience appeal without explicit latent-thought training — or that the compression problem was overstated.

(3) Propose 2 research questions that assume the regime may have moved: e.g., "If latent-thought reconstruction is now standard, what NEW opacity emerges between text and reasoning?" or "Do models trained on both surface and latent layers still fail on forms of expertise that *require* ineffable intuition?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Experts' polished writing hides the messy thinking that made them experts — the deleted process is the real lesson.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8