INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›Does recurrence enable reasoning c…›this inquiring line

For some problems, step-by-step thinking isn't optional — and making an AI bigger or wider simply cannot substitute for it.

Can any architecture fundamentally solve problems that require inherently sequential computation?

This explores whether the limits on solving inherently sequential problems are about the architecture you pick (transformers vs. recurrent vs. something new) or about a deeper computational barrier no design escapes — and what the corpus says about routing around it.

This question is really two questions wearing one coat: is there a hard wall around problems that demand step-by-step computation, and if so, can any architecture climb over it? The corpus's sharpest answer comes from complexity theory. The serial scaling hypothesis argues that some problems genuinely require polynomial-depth reasoning, and that parallel architectures like the Transformer cannot solve them no matter how large you scale — width and parameters don't substitute for depth Can parallel architectures solve inherently sequential problems?. So the honest framing is: the wall is real, but it's a wall against a *class* of architectures (fixed-depth, parallel), not against computation itself.

The escape route the corpus keeps returning to is recurrence — buying serial depth back by computing in time rather than in layers. The Hierarchical Reasoning Model couples slow abstract planning with fast detailed steps across two timescales, and with only 27M parameters solves Sudoku and mazes where chain-of-thought collapses entirely, explicitly breaking the AC0/TC0 ceiling that traps fixed-depth transformers Can recurrent hierarchies achieve reasoning that transformers cannot?. Chain-of-thought itself is a softer version of the same trick: on compositional tasks like graph connectivity it earns an *exponential* advantage over parallel voting, precisely because the answer requires accumulating intermediate results in order, which short parallel chains can't fake When does sequential reasoning beat parallel voting?. The pattern: you don't beat sequential problems by changing the substrate, you beat them by letting the system take more genuine steps.

Here's the surprise the corpus hands you. A single finite-size transformer is already Turing complete — there exists a fixed network that can compute any computable function given the right prompt Can a single transformer become universally programmable through prompts?. So in principle the architecture is not the bottleneck at all; the bottleneck is whether training ever teaches the model to *use* that latent universality. That reframes the whole debate: reasoning models persistently beat non-reasoning ones at equal inference budget not because they have more raw capacity, but because training installs a protocol that makes each extra step productive Can non-reasoning models catch up with more compute?.

The most counterintuitive corner is that you can often dodge the depth wall by re-shaping the problem rather than deepening the model. MAKER decomposes million-step tasks into minimal subtasks with voting at each step and runs them to zero errors — using *small, non-reasoning* models, inverting the assumption that hard sequential problems need bigger brains Can extreme task decomposition enable reliable execution at million-step scale?. Recursive subtask trees with cache pruning sustain reasoning past context limits Can recursive subtask trees overcome context window limits?, and Atom of Thoughts goes further, making each state depend only on the current contracted subproblem — a memoryless, Markov-style chain that keeps coherence without dragging history Can reasoning systems forget history without losing coherence?. Separating the planner from the solver helps too, since the decomposition skill generalizes across domains even when solving doesn't Does separating planning from execution improve reasoning accuracy?.

The sobering counterweight: capability in principle isn't competence in practice. Frontier reasoning models score only 20–23% on constraint-satisfaction problems that demand sustained backtracking, meaning fluent-looking reflection doesn't translate to actually solving unfamiliar sequential structure Can reasoning models actually sustain long-chain reflection?. So the full answer: no architecture *evades* the need for sequential computation, but several — recurrence, decomposition, prompted universality — supply it. The frontier isn't inventing an architecture that cheats the wall; it's getting models to reliably spend the serial steps they're already capable of taking.

Sources 10 notes

Can parallel architectures solve inherently sequential problems?

Complexity theory proves that problems requiring polynomial-depth reasoning cannot be solved by parallel architectures like Transformers, even with infinite scaling. Progress requires recurrent structures that increase serial computation depth.

Can recurrent hierarchies achieve reasoning that transformers cannot?

The Hierarchical Reasoning Model couples slow abstract planning with fast detailed computation across two timescales, achieving near-perfect performance on Sudoku and mazes where chain-of-thought methods fail completely. With only 27M parameters and 1,000 samples, HRM escapes the AC0/TC0 complexity ceiling that constrains fixed-depth transformers.

When does sequential reasoning beat parallel voting?

On structured tasks requiring sequential multi-step reasoning like graph connectivity, chain-of-thought achieves exponentially higher accuracy than parallel voting. The difference emerges because solutions genuinely require accumulating intermediate results sequentially, which short parallel chains cannot achieve.

Can a single transformer become universally programmable through prompts?

Research proves a single finite-size transformer exists that can compute any computable function given the right prompt, achieving complexity bounds nearly matching unbounded models. However, standard training rarely produces models that learn to implement arbitrary programs this way.

Can non-reasoning models catch up with more compute?

Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.

Show all 10 sources

Can extreme task decomposition enable reliable execution at million-step scale?

MAKER solves million-step tasks with zero errors by decomposing into minimal subtasks, applying voting at each step, and flagging correlated errors. Surprisingly, small non-reasoning models suffice when decomposition is extreme enough, inverting the standard approach to hard problems.

Can recursive subtask trees overcome context window limits?

The Thread Inference Model demonstrates that reasoning structured as recursive subtask trees with rule-based KV cache pruning sustains accurate reasoning beyond context limits, even when manipulating 90% of the cache. This enables single models to replace multi-agent systems by handling full recursive reasoning internally.

Can reasoning systems forget history without losing coherence?

Atom of Thoughts decomposes problems into DAGs and contracts them iteratively, ensuring each state depends only on the current problem—not prior steps. This memoryless approach eliminates historical baggage that bloats reasoning while maintaining answer equivalence.

Does separating planning from execution improve reasoning accuracy?

Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.

Can reasoning models actually sustain long-chain reflection?

DeepSeek-R1 and o1-preview achieve only 20-23.6% exact match on 850 constraint satisfaction problems requiring genuine backtracking. This ceiling reveals that reflective reasoning fluency does not translate to actual problem-solving competence on unfamiliar instance structures.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity2.56 match · arxiv ↗
Hierarchical Reasoning Model2.52 match · arxiv ↗
Faith and Fate: Limits of Transformers on Compositionality2.39 match · arxiv ↗
Ask, and it shall be given: Turing completeness of prompting1.74 match · arxiv ↗
Atom of Thoughts for Markov LLM Test-Time Scaling1.74 match · arxiv ↗
Reasoning Models Can Be Effective Without Thinking1.74 match · arxiv ↗
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning1.72 match · arxiv ↗
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention1.70 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an LLM research analyst re-testing claims about sequential computation and architecture limits. The question: *Can any architecture fundamentally solve problems requiring inherently sequential computation?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2026; treat as perishable:

• The Serial Scaling Hypothesis: some problems need polynomial-depth reasoning; fixed-depth parallel architectures (e.g., standard Transformers) cannot solve them regardless of scale (2025, arXiv:2507.12549).
• Recurrence buys serial depth: Hierarchical Reasoning Model (27M params) solves Sudoku/mazes by explicit time-unrolled steps, breaking AC0/TC0 ceilings that trap shallow Transformers (2025, arXiv:2506.21734).
• Single finite Transformer is Turing complete in principle — the bottleneck is training, not substrate; reasoning models beat non-reasoning ones because training installs a step-productive protocol (2024, arXiv:2411.01992).
• Task decomposition + voting dodges depth walls: MAKER solves million-step tasks with small non-reasoning models and minimal subtasks (2025, arXiv:2511.09030); Atom of Thoughts uses Markov-style memoryless states (2025, arXiv:2502.12018).
• Frontier reasoning models score only 20–23% on constraint-satisfaction requiring sustained backtracking — fluent reflection ≠ solving unfamiliar sequential structure.

Anchor papers (verify; mind their dates):
• arXiv:2507.12549 (2025): Serial Scaling Hypothesis
• arXiv:2506.21734 (2025): Hierarchical Reasoning Model
• arXiv:2411.01992 (2024): Turing completeness of prompting
• arXiv:2511.09030 (2025): Solving a Million-Step LLM Task

Your task:
(1) RE-TEST EACH CONSTRAINT. For each claim above, judge whether post-2026 models, training methods, inference orchestration (memory, caching, multi-step sampling), or benchmarks have relaxed or overturned it. Separate durable question (likely still open) from perishable limitation (possibly resolved); cite what resolved it. Does the Turing-completeness ceiling hold empirically? Has constraint-satisfaction performance recovered beyond 20–23%?
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially anything that *rejects* the depth-wall premise or shows shallow architectures solving traditionally sequential problems without explicit recurrence.
(3) Propose 2 research questions that ASSUME the regime has moved: e.g., if task decomposition + voting now scales to *any* sequential problem, what is the new frontier? If Turing completeness is now trainable, what blocks generalization across problem families?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

For some problems, step-by-step thinking isn't optional — and making an AI bigger or wider simply cannot substitute for it.

Related lines of inquiry

Sources 10 notes

Papers this line draws on 8