INQUIRING LINE

Inquiring lines›How do language models construct a…›How are AI-generated and human-wri…›When does architectural design mat…›this inquiring line

Scale sets the ceiling for what an AI can do — but something else decides what it actually reaches for.

What role does inductive bias play versus model capacity in practice?

This explores a tension in how models actually get their abilities: is performance driven by raw capacity (scale, parameters, how much a model *could* represent), or by inductive bias — the structural priors baked in during pretraining and architecture that shape what the model reaches for by default?

This explores whether what a model can do is mostly a matter of raw capacity, or of inductive bias — the built-in tendencies that pretraining and architecture install before any task-specific training. The corpus leans hard toward bias, but with a twist: capacity sets the ceiling, while bias decides what actually gets used. Several independent lines of work converge on the claim that base models already contain latent reasoning ability, and that post-training merely *selects* it rather than creating it Do base models already contain hidden reasoning ability?. RL post-training, on this view, teaches a model *when* to deploy reasoning, not *how* to reason — hybrid models recover most of the gains just by routing tokens, and the activation directions for reasoning strategies exist before training begins Does RL post-training create reasoning or just deploy it?. So the practical lever isn't adding capacity; it's shaping the bias that governs elicitation.

Where does that bias come from? Largely pretraining, and it runs deeper than most fine-tuning interventions can reach. A causal study using random seeds and cross-tuning found that models sharing a pretrained backbone show the same cognitive bias patterns regardless of what fine-tuning data they later saw — biases are planted in pretraining and only nudged afterward Where do cognitive biases in language models come from?. And what's absorbed isn't just facts. Reasoning generalization is driven by *procedural* knowledge spread across many documents, while factual recall depends on narrow memorization — meaning the transferable, reusable priors (the useful inductive bias) come from the diversity of how-to patterns in the corpus, not from any single source Does procedural knowledge drive reasoning more than factual retrieval?.

The sharper, less comfortable finding is that inductive bias often *masquerades* as capacity. Models can look like they're reasoning when they're really just leaning on a prior. Twelve of fourteen models in one study performed worse when constraints were removed — they were defaulting conservatively to harder options, not actually evaluating the problem Are models actually reasoning about constraints or just defaulting conservatively?. The same theme shows up in how models inherit human cognitive shortcuts: they reproduce human content effects on logic tasks item-by-item Do language models show the same content effects humans do? and replicate specific human causal-reasoning errors like weak explaining-away and Markov violations Do large language models make the same causal reasoning mistakes as humans?. Those aren't capacity limits — they're biases absorbed from training-data statistics, and high accuracy can hide them entirely, the same way a 'theory-free' high-accuracy model can launder statistical error as objectivity Can AI models be truly free from human bias?.

The payoff is that if bias is the real bottleneck, you can intervene on it cheaply rather than scaling capacity. Reasoning verbosity turns out to be a single linear direction in activation space — extract one vector from 50 examples and cut chain-of-thought length 67% with no retraining Can we steer reasoning toward brevity without retraining?. Architecture itself is an inductive-bias knob: making latent reasoning transitions stochastic rather than deterministic lets a model hold uncertainty and explore multiple solutions that the deterministic prior simply couldn't represent Can stochastic latent reasoning let models explore multiple solutions?. The cautionary flip side is that clumsy training can install *bad* bias on top of good capacity: over-hard RLVR samples teach degenerate shortcuts that then contaminate abilities the model already had Do overly hard RLVR samples actually harm model capabilities?.

The thing you might not have expected to learn: in this corpus, 'add capacity' is rarely the answer to a reasoning problem — the capacity is usually already there, latent. The leverage is almost always in the inductive bias: where it came from (pretraining, not fine-tuning), how it disguises itself as competence, and how surprisingly editable it is once you can name the direction it points in.

Sources 11 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Where do cognitive biases in language models come from?

A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Are models actually reasoning about constraints or just defaulting conservatively?

Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.

Show all 11 sources

Do language models show the same content effects humans do?

LLMs show identical content-sensitivity patterns to humans on NLI, syllogisms, and Wason tasks, with belief-bias signatures matching human error rates item-by-item. This behavioral isomorphism across three independent tasks suggests content and logical form are inseparable in transformer reasoning architecturally.

Do large language models make the same causal reasoning mistakes as humans?

LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.

Can AI models be truly free from human bias?

Research shows that 'theory-free' AI models mask bigotry behind high accuracy metrics while committing fundamental statistical errors. A 95% accurate criminal justice system would wrongly convict thousands, demonstrating that model sophistication does not validate causal inference.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Can stochastic latent reasoning let models explore multiple solutions?

GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent probability distributions over solutions rather than single points. This lets recursive reasoners maintain uncertainty, explore alternatives, and handle ambiguous or multi-solution problems that deterministic single-path designs cannot.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a capability researcher re-testing claims about inductive bias versus model capacity in LLMs. The question remains: what actually limits reasoning performance—raw model size, or the built-in priors baked into architecture and pretraining?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; treat as anchors to verify, not current truth.
- Base models contain latent reasoning ability; post-training *selects* it rather than creating it; reasoning verbosity is a single linear direction in activation space, editable without retraining (~2025).
- Cognitive biases are planted in pretraining, not fine-tuning; models sharing a pretrained backbone show identical bias patterns regardless of downstream task data (~2025).
- Procedural knowledge (how-to patterns) from diverse pretraining documents drives generalization; factual recall depends on narrow memorization (~2024).
- High apparent reasoning performance often masks conservative defaults and human cognitive shortcuts (content effects, weak explaining-away, Markov violations); 12 of 14 models performed worse when constraints were removed (~2026, ~2025, ~2022).
- Overly difficult RLVR samples install degenerate shortcuts that contaminate latent abilities (~2026).

Anchor papers (verify; mind their dates):
- arXiv:2507.07186 (2025-07): Cognitive biases planted in pretraining
- arXiv:2507.04742 (2025-07): Activation steering for chain-of-thought compression
- arXiv:2605.28388 (2026-05): Sample difficulty in RLVR
- arXiv:2502.10215 (2025-02): Do LLMs reason causally like humans?

Your task:
(1) RE-TEST EACH CONSTRAINT. Has model scaling, better pretraining curricula, improved RLVR harnesses, mechanistic interpretability tooling, or multi-agent orchestration since RELAXED or OVERTURNED any of these findings? Separate the durable question (bias vs. capacity trade-off likely remains real) from the perishable claim (e.g., that a single linear direction controls reasoning verbosity—true in 2025, but does newer steering break it?). Where a constraint still appears to hold, cite what held it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Does any recent paper argue capacity, not bias, is the true bottleneck? Or show that fine-tuning CAN override pretraining bias?
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., "If reasoning is a learnable activation direction, can we engineer it into weak models without pretraining?"; "Does chain-of-thought verbosity compress identically across model families, or does architecture-specific bias defeat the direction?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Scale sets the ceiling for what an AI can do — but something else decides what it actually reaches for.

Related lines of inquiry

Sources 11 notes

Papers this line draws on 8