INQUIRING LINE

Do transformers learn generalizable algorithms or instance-based patterns?

This explores whether transformers solve problems by learning the underlying rule (an algorithm that transfers to new cases) or by stitching together patterns memorized from training — and what tips them one way or the other.


This explores whether transformers learn a real algorithm or just memorize instance-shaped patterns — and the corpus's honest answer is: by default, mostly patterns, but the default can be changed. The pessimistic core comes from work showing that what looks like reasoning is often lookup in disguise. Compositional reasoning, on close inspection, collapses into "linearized subgraph matching" — the model succeeds on in-distribution problems by recalling computation paths it saw during training, then fails sharply on novel combinations with errors compounding step by step Do transformers actually learn systematic compositional reasoning?. The same shape shows up at the level of world models: transformers trained on orbital mechanics or board games don't extract the unified law underneath; they accrete task-specific heuristics — arithmetic, for instance, runs on "range-matching" tricks rather than an addition algorithm, and fine-tuning exposes the seams by producing slice-dependent, nonsensical rules Do foundation models learn world models or task-specific shortcuts?. A vivid micro-version: models will develop attention toward the right tokens in graph data yet barely notice when you shuffle the actual edges — they recognize "this is a graph" as a category instead of using the connections Can language models actually use graph structure information?.


Sources 9 notes

Do transformers actually learn systematic compositional reasoning?

Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.

Do foundation models learn world models or task-specific shortcuts?

Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.

Can language models actually use graph structure information?

LLMs develop attention shifts toward node tokens after training, but randomly shuffled topology barely affects performance. Models treat graph data as a category to recognize rather than as structured relationships to use.

Can looped transformers generalize to unseen knowledge combinations?

Recurrent-depth transformers with shared parameters across iterations enable systematic generalization and depth extrapolation that vanilla transformers cannot achieve. This emerges through a sharp three-phase process: memorization, in-distribution, then out-of-distribution generalization.

Can transformers improve exponentially by learning from their own correct solutions?

Standard transformers generalize from 10-digit to 100-digit addition by repeatedly generating solutions, filtering for correctness, and retraining—showing exponential (not linear) out-of-distribution improvement across rounds without saturation.

How do transformers learn to reason across multiple steps?

Controlled training reveals transformers learn multi-hop reasoning in three phases: memorization, in-distribution generalization, and cross-distribution reasoning. Successful reasoning correlates with cosine clustering of entity representations, and second-hop generalization requires explicit compositional exposure during training.

Can explicit stack tracking improve how transformers learn recursive syntax?

Pushdown Layers—a drop-in self-attention replacement with explicit stack tracking—achieve 3-5x more sample-efficient syntactic generalization while maintaining perplexity. The improvement shows that recursive structure specifically benefits from architectural inductive bias despite general compositional generalization emerging from scale.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Can a single transformer become universally programmable through prompts?

Research proves a single finite-size transformer exists that can compute any computable function given the right prompt, achieving complexity bounds nearly matching unbounded models. However, standard training rarely produces models that learn to implement arbitrary programs this way.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a synthesis auditor tracking whether transformers learn generalizable algorithms or instance patterns. This question remains open—do not assume prior findings hold unchanged.

What a curated library found—and when (dated claims, not current truth):
Findings span 2023–2026. A curated library reported:
• Compositional reasoning collapses into "linearized subgraph matching"—models recall computation paths from training; novel combinations fail with compounding errors (~2023).
• Transformers develop task-specific heuristics (e.g., arithmetic via "range-matching") rather than unified world models; fine-tuning exposes slice-dependent, nonsensical rules (~2024).
• LLM attention recognizes graph category membership but ignores actual edge topology—categorization over connection-modeling (~2023).
• Prompting alone achieves Turing completeness in theory; RL + Transformers and self-improving variants show length generalization (~2024–2025).
• Recurrent-depth architectures and implicit reasoning probe suggest latent algorithmic capacity under certain training regimes (~2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2305.18654 (Faith and Fate, 2023)
• arXiv:2405.00208 (Primer on Inner Workings, 2024)
• arXiv:2501.14176 (RL + Transformer, 2025)
• arXiv:2507.06952 (Probing World Models, 2025)

Your task:
(1) RE-TEST each constraint. For instance-pattern dominance: have scaling, architectural variants (recurrence, memory, tool-use), training procedures (RL, curriculum), or eval frameworks since OVERTURNED the pattern-collapse thesis? Separate the durable claim (transformers *default* to patterns) from the perishable one (this cannot be changed). Cite what changed it.
(2) Surface the strongest WORK CONTRADICTING the pessimistic synthesis—especially recent papers arguing transformers *do* extract genuine algorithms under specific conditions.
(3) Propose two research questions that assume the regime has shifted: (a) Under what training + architecture + scale do transformers transition from pattern-matching to algorithmic learning? (b) Can implicit reasoning + recurrent depth recover Turing-complete computation without explicit prompting?

Cite arXiv IDs; flag anything ungrounded.

Next inquiring lines