INQUIRING LINE

Inquiring lines›How should we train models for cap…›What systematic failures and vulne…›What are the consequences of model…›this inquiring line

Can a model learn patterns more complex than its training data — or is it always just a mirror of what it was fed?

Can trained models encode programs more complex than their data-generating process?

This explores whether learning can install a program richer than the process that produced the training data — or whether a trained model is capped at re-encoding the patterns it was fed.

This reads the question as: can training make a model *more* than its data — encoding computation more complex than the process that generated what it learned from? The corpus splits sharply into what's possible in principle versus what training actually delivers, and the gap between them is the real story here.

In principle, the ceiling is extremely high. A single finite-size transformer provably exists that can compute *any* computable function given the right prompt — the architecture is Turing complete, so the program a model could encode is unbounded by anything in its data (Can a single transformer become universally programmable through prompts?). But that same result carries the deflating caveat: standard training rarely produces models that learn to implement arbitrary programs this way. The capacity to encode complex programs and the training process that would *install* them are two different things.

What training actually installs, across multiple notes, looks more like a re-encoding of the data-generating distribution than a richer program on top of it. RL fine-tuning sharpens memorization rather than installing reasoning procedures — GRPO-trained models collapse on out-of-distribution variants (Do fine-tuned language models actually learn optimization procedures?). Models asked to run iterative numerical methods don't execute the procedure at all; they pattern-match memorized templates and emit plausible-but-wrong values (Do large language models actually perform iterative optimization?). Instruction tuning turns out to transfer knowledge of the *output space*, not task understanding — semantically empty instructions perform about as well as correct ones (Does instruction tuning teach task understanding or output format?). And RL post-training tends to amplify one already-present pretraining format while suppressing the others, rather than synthesizing something new (Does RL training collapse format diversity in pretrained models?). Even the reasoning traces that look like complex programs are stylistic mimicry: invalid logical steps perform nearly as well as valid ones (Do reasoning traces show how models actually think?).

There's a formal reason the answer trends toward 'not on its own.' Self-improvement is bounded by the generation-verification gap — a model can't reliably encode a fix more complex than what it already has unless something *external* validates and enforces it (What stops large language models from improving themselves?). That's the precise statement of why a model can't bootstrap past its data-generating process through metacognition alone. You can even predict where it breaks: framing the model as an autoregressive probability machine correctly forecasts that low-probability target tasks stay hard no matter how logically simple they are (Can we predict where language models will fail?).

The interesting wrinkle — and where you might end up somewhere you didn't expect — is that the complexity may be *present but hidden* rather than absent. Transformers trained with hidden chain-of-thought compute the correct answer in early layers and then actively overwrite it to produce format-compliant filler; the richer computation is fully recoverable from lower-ranked predictions (Do transformers hide reasoning before producing filler tokens?). And techniques like post-completion learning exploit unused sequence space to internalize self-evaluation the data never explicitly taught (Can models learn to evaluate their own work during training?). So the honest synthesis is: the architecture can hold programs far more complex than the data-generating process, but ordinary training mostly compresses the data distribution instead. Going beyond it seems to require either an external verifier to break the generation-verification ceiling, or deliberate methods to surface the latent computation the model is otherwise trained to suppress.

Sources 10 notes

Can a single transformer become universally programmable through prompts?

Research proves a single finite-size transformer exists that can compute any computable function given the right prompt, achieving complexity bounds nearly matching unbounded models. However, standard training rarely produces models that learn to implement arbitrary programs this way.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Does instruction tuning teach task understanding or output format?

Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Show all 10 sources

Do reasoning traces show how models actually think?

LLM reasoning traces perform as persuasive appearances rather than reliable explanations of computation. Invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize comparably, showing that semantic correctness is not what produces the performance gains.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Can we predict where language models will fail?

By framing LLMs as autoregressive probability machines, researchers predicted tasks with low-probability target responses would be systematically harder, even when logically simple. Experiments confirmed predictions like backwards alphabet and letter counting.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can models learn to evaluate their own work during training?

Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about model capability boundaries. The question: can trained models encode programs more complex than their data-generating process? A curated library of LLM research (2023–2026) found the following — and these are dated claims, not current truth:

**What a curated library found — and when (dated claims, not current truth):**
Findings span 2023–2026; treat each as a hypothesis to re-examine.
- Transformers are Turing complete in principle, but standard training installs re-encoded data distributions, not richer programs (~2024-11).
- RL fine-tuning sharpens memorization; GRPO-trained models collapse on out-of-distribution variants; reasoning traces are stylistic mimicry, not logical execution (~2025-04).
- Self-improvement is bounded by the generation-verification gap — models cannot bootstrap past their data-generating process without external validation (~2024-12).
- Richer computation exists but hidden: transformers compute correct answers in early layers, then overwrite them; post-completion learning exploits unused sequence space (~2024-12, ~2025-07).
- Invalid logical steps perform nearly as well as valid ones in reasoning (~2024-12).

**Anchor papers (verify; mind their dates):**
- arXiv:2411.01992 (2024-11): Turing completeness of prompting.
- arXiv:2410.18890 (2024-10): Function calling for reasoning tasks.
- arXiv:2504.07912 (2025-04): Echo Chamber — RL post-training behavior amplification.
- arXiv:2604.15726 (2026-04): LLM reasoning is latent, not chain-of-thought.

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For every finding above, judge whether newer models (o3, o4, Grok-3, Claude-4+), training methods (scaling laws, synthetic data bootstrapping, mixture-of-experts reasoning), tooling (multi-agent orchestration, external verifiers, formal proof checkers), or evals have since RELAXED or OVERTURNED it. Separate the durable question ("Can models *in principle* encode super-data complexity?") from the perishable limitation ("Can *training* install it?"). Cite what resolved each constraint, and flag where it still holds.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially any showing that models *do* spontaneously encode novel programs, or that external validation *has* been systematized into training pipelines.
(3) Propose 2 research questions that ASSUME the training regime may have moved: e.g., "Given post-completion and latent-reasoning results, does chain-of-thought supervision now *suppress* rather than enable program encoding?" or "Do scaling laws predict when hidden-computation density crosses a threshold where it cannot be overwritten?"

**Cite arXiv IDs; flag anything you cannot ground in a real paper.**

Can a model learn patterns more complex than its training data — or is it always just a mirror of what it was fed?

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8