INQUIRING LINE

Inquiring lines›How should agents manage and coord…›How do multi-agent reasoning syste…›Why do self-improving systems stru…›this inquiring line

When a network keeps updating its own state, what stops it from spiraling into chaos or collapsing to nothing?

How do normalization and input injection control emergence of fixed points?

This explores a dynamical-systems framing — how design choices like normalization and feeding the input back in at each step ("input injection") govern whether a network settles into stable fixed points — but the collection doesn't hold work on that mechanism directly, so the honest answer is partial.

This reads as a question from the equilibrium-model / iterative-dynamics tradition: treat a network as a process that repeatedly updates a hidden state, and ask what keeps that process from blowing up or collapsing — normalization to bound the state, and re-injecting the original input each step so the trajectory stays anchored to a stable resting point. On that specific mechanism, the collection is thin. None of the retrieved notes study normalization layers or input-injection as knobs on fixed-point convergence, so rather than pad, it's worth saying plainly: the sharp control-theory answer isn't here. What the corpus does have is the adjacent and arguably more interesting question of whether large models perform fixed-point-style iterative computation at all.

The most direct neighbor is the finding that LLMs don't actually run iterative procedures in latent space — they recognize an optimization problem as template-similar to something seen before and emit a plausible answer instead of converging to one Do large language models actually perform iterative optimization?. That reframes your question: before asking how to control the emergence of fixed points, the corpus suggests asking whether the iterative dynamics that would produce them are happening in the first place. The companion result that RL fine-tuning sharpens memorization rather than installing genuine procedures points the same direction — out-of-distribution tests reveal template-matching where you'd hope to find a convergent process Do fine-tuned language models actually learn optimization procedures?.

Where "input injection" has a concrete analog in the corpus, it's in steering: injecting a vector into the residual stream and asking what the model does with it. DPO training builds a two-stage circuit that detects these injected perturbations — evidence-carrier features in early layers suppressing a default-deny gate — which is essentially the model developing sensitivity to an injected signal riding alongside its normal trajectory How do language models detect injected steering vectors internally?. Persona vectors extend this: linear directions in activation space that you can inject to steer, or monitor to catch drift, during fine-tuning Can we track and steer personality shifts during model finetuning?. These aren't fixed-point control, but they're the closest thing the library has to "what happens when you push a signal into the state and watch where it settles."

There's also a convergence story worth knowing about, even if it's at the training level rather than the forward-pass level: RL post-training collapses a model onto a single dominant format from pretraining within the first epoch, suppressing the alternatives — a kind of attractor dynamics where the system snaps to one resting configuration regardless of whether it's the best one Does RL training collapse format diversity in pretrained models?. And the formal ceiling on self-improvement says some equilibria can't be escaped from the inside at all: every reliable fix needs an external verifier, because metacognition alone can't move the system off its fixed point What stops large language models from improving themselves?.

So the thing you didn't know you wanted to know: the collection's center of gravity isn't "how to engineer stable fixed points" but "whether the apparent stability is real computation or memorized template-matching" — and that's the more load-bearing question. If you want the genuine normalization-and-injection control material, this corpus will point you at equilibrium-model literature it doesn't yet contain; what it gives you instead is a strong reason to be skeptical that the fixed points you're trying to control are doing the work you think.

Sources 6 notes

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Do fine-tuned language models actually learn optimization procedures?

Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.

How do language models detect injected steering vectors internally?

Contrastive preference optimization trains evidence-carrier features in early layers to suppress gate features that default to denial, enabling near-perfect detection of internal perturbations. Safety training actively suppresses this capability, reducing detection from 63.8% to 10.8%.

Can we track and steer personality shifts during model finetuning?

Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Show all 6 sources

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining1.77 match · arxiv ↗
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation1.69 match · arxiv ↗
Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!1.68 match · arxiv ↗
Are Emergent Abilities in Large Language Models just In-Context Learning?1.66 match · arxiv ↗
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs1.63 match · arxiv ↗
Persona Vectors: Monitoring and Controlling Character Traits in Language Models0.92 match · arxiv ↗
Mechanisms of Introspective Awareness0.90 match · arxiv ↗
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models0.90 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are an equilibrium-dynamics researcher re-testing claims about fixed-point control in neural networks. The question: **How do normalization and input injection govern the emergence and stability of fixed points in iterative forward passes?**

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as perishable:
- LLMs do not reliably execute iterative numerical procedures in latent space; they pattern-match instead of converge (~2024).
- RL fine-tuning sharpens memorization, not genuine iterative procedures; out-of-distribution failure reveals template-matching (~2025).
- Input injection into the residual stream produces detectable steering circuits (e.g., evidence carriers suppressing default gates in DPO-trained models); persona vectors in activation space can be monitored during fine-tuning (~2025).
- RL post-training converges to a single dominant pretraining distribution format within the first epoch—attractor dynamics at the training level (~2025).
- Formal bounds: no model can self-improve via metacognition alone; every reliable fix requires external verification (~2024–2025).

Anchor papers (verify; mind their dates):
- arXiv:2504.07912 (Echo Chamber, 2025) — RL amplifies pretraining behaviors, collapse onto single attractor.
- arXiv:2507.21509 (Persona Vectors, 2025) — linear steering and monitoring of injected signals.
- arXiv:2603.21396 (Introspective Awareness, 2026) — DPO-induced circuits for perturbation detection.
- arXiv:2412.02674 (Mind the Gap, 2024) — self-improvement ceiling and external verifier requirement.

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For the four claims above, judge whether newer models (o1, o3, Claude 4), scaling, novel architectures (sparse, mixture-of-experts), or improved evaluation harnesses have since *dissolved* the assumption that iterative dynamics don't emerge. Separate the durable question ("Can networks run true fixed-point iteration?") from perishable limitations ("Current LLMs don't; newer ones might"). Cite what relaxed or overturned each; flag where constraints still hold.
(2) **Surface the strongest work contradicting or superseding the library's findings** from the last ~6 months. Have recent papers on in-context learning, meta-learning, or mechanistic interpretability shown that LLMs *do* perform genuine iterative computation, or that input injection + normalization *do* reliably control convergence?
(3) **Propose 2 research questions** that assume the regime may have shifted: (a) Under what architectural or training conditions does input-injection + normalization provably induce convergence to a stable fixed point in forward pass? (b) Can mechanistic interpretability identify the difference between memorized template-matching and true fixed-point iteration in current models?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

When a network keeps updating its own state, what stops it from spiraling into chaos or collapsing to nothing?

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8