INQUIRING LINE

Can closed-form solutions compete with gradient descent optimization?

This explores whether one-shot analytical answers (closed-form, the kind LLMs reach for by pattern-matching) can hold their own against iterative, gradient-style optimization that refines an answer step by step — and the corpus reframes the question around what happens inside language models.


This explores whether a model can just *recall the answer* (closed-form, one shot) instead of *working toward it* (iterative optimization) — and the surprising thread across this collection is that language models try the closed-form route by default, and it quietly fails. When you hand an LLM an optimization problem, it doesn't actually run the iterations in its head; it recognizes the problem as similar to ones it has seen and emits a plausible-looking answer Do large language models actually perform iterative optimization?. That looks like a closed-form shortcut, but it's really memorized template-matching, and it plateaus hard — around 55–60% constraint satisfaction no matter how big the model gets Do larger language models solve constrained optimization better?. Fine-tuning doesn't rescue it either: supervised training makes the *output* look correct without making it actually feasible Does supervised fine-tuning actually improve reasoning on optimization problems?.

The deeper reason cuts to architecture. Real optimization — whether gradient descent or a constraint solver — depends on *taking things back*: discarding a bad partial answer and trying again. Autoregressive generation can't retract a token once it's emitted, so it structurally lacks the one primitive that iterative search relies on Why does autoregressive generation fail at constraint satisfaction?. That's why a one-shot closed-form pass isn't just weaker here, it's missing the machinery to compete.

So what wins? Bringing the iteration back. Energy-Based Transformers reintroduce gradient descent *at inference time* — assigning an energy score to candidate predictions and minimizing it — and gain meaningfully on both training scaling and out-of-distribution generalization compared to standard transformers Can energy minimization unlock reasoning without domain-specific training?. Evolutionary search at inference does the same thing with a different engine: it keeps a diverse population of candidate solutions and mutates them, solving 98% of planning tasks and beating one-shot Best-of-N sampling precisely because it refuses to commit to a single trajectory Can evolutionary search beat sampling and revision at inference time?. Tree search (MCTS) lands in the same camp — iterating over solution paths rather than guessing one Can tree search replace human feedback in LLM training?.

The thing you might not have expected to learn: the contest isn't really "closed-form vs. gradient descent" as competing answers to the same math problem. It's that language models *pretend* to do closed-form and the corpus keeps showing the cure is to graft iterative optimization back on top of them — search, energy minimization, or an external symbolic solver. The closed-form instinct is the failure mode, not the rival.


Sources 7 notes

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Do larger language models solve constrained optimization better?

Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.

Does supervised fine-tuning actually improve reasoning on optimization problems?

Supervised fine-tuning makes model outputs look correct—proper JSON structure, valid identifiers, expected sections—without making them physically feasible. The model learns surface features of solutions, not the reasoning to construct valid ones.

Why does autoregressive generation fail at constraint satisfaction?

The performance ceiling on constraint satisfaction problems is not a model-quality issue but an architectural limitation: autoregressive transformers cannot retract emitted tokens, while CSP solvers fundamentally depend on discarding invalid partial assignments. Symbolic solver integration works because it supplies what the architecture lacks.

Can energy minimization unlock reasoning without domain-specific training?

Energy-Based Transformers assign energy values to input-prediction pairs and use gradient descent minimization for inference, yielding 35% higher training scaling rates and 29% more inference-compute gains than Transformer++, while generalizing better on out-of-distribution data without domain-specific scaffolding.

Can evolutionary search beat sampling and revision at inference time?

Mind Evolution uses genetic algorithms with LLM-generated mutations and crossovers to significantly outperform Best-of-N and Sequential Revision on planning benchmarks. An island model sustains population diversity, preventing the premature convergence that single-trajectory refinement exhibits.

Can tree search replace human feedback in LLM training?

AlphaLLM uses tree search outcomes and three critic models to derive dense reward signals equivalent to human-labeled feedback. Tree structure naturally ranks solution paths by success, replacing the annotation oracle that standard RLHF requires.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an AI research analyst. The question remains open: Can closed-form solutions compete with gradient descent optimization in language models?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat these as perishable constraints:
• LLMs default to template-matching closed-form guesses rather than iterative optimization, plateauing at 55–60% constraint satisfaction regardless of scale (~2026).
• Autoregressive generation structurally cannot retract tokens, blocking the backtracking primitive that iterative search requires (~2025–2026).
• Energy-Based Transformers reintroduce gradient descent at inference time, outperforming standard transformers on OOD generalization and training scaling (~2025).
• Evolutionary search at inference solves 98% of planning tasks, beating one-shot Best-of-N by maintaining a diverse population of candidates (~2025).
• Supervised fine-tuning improves response formatting but does not enforce physical feasibility (~2024–2025).

Anchor papers (verify; mind their dates):
• arXiv:2507.02092 Energy-Based Transformers are Scalable Learners and Thinkers (2025-07)
• arXiv:2603.23004 Can Large Language Models Reason and Optimize Under Constraints? (2026-03)
• arXiv:2510.25992 Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning (2025-10)
• arXiv:2501.09891 Evolving Deeper LLM Thinking (2025-01)

Your task:
(1) RE-TEST THE 55–60% PLATEAU CLAIM. Has post-training (RL, DPO, or curriculum learning per arXiv:2508.08940) since lifted this ceiling? Have architectural shifts (diffusion-based reasoning per arXiv:2502.09992, or bilevel autoresearch per arXiv:2603.23420) changed the feasibility landscape? Separately, does the non-retractability constraint still hold under new decoding schemes (speculative, rejection sampling, or multi-draft)?  
(2) Surface the strongest work from the last ~6 months contradicting the "closed-form as failure mode" thesis. What, if anything, shows closed-form reasoning *scaling usefully* or *hybrid pipelines* outperforming pure iterative search?  
(3) Propose 2 research questions that assume the regime has moved: (a) Under what training objectives do LLMs learn to *emit partial solutions and refine them* without external search scaffolding? (b) Can mechanistic interpretability (per arXiv:2605.28388) reveal which internal circuits perform closed-form lookup vs. iterative reasoning, and can that separation be sharpened post-training?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines