INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›What limits mechanistic interpreta…›this inquiring line

Ordinary training choices silently turn an autoencoder into a system with hidden gravity wells that nobody explicitly built.

How do encode-decode contractive biases create stable attractors in latent space?

This explores how an autoencoder's repeated encode-decode loop quietly turns into a dynamical system — where ordinary training pressures (not deliberate design) pull points toward fixed destinations in latent space.

This explores how an autoencoder's repeated encode-decode loop quietly turns into a dynamical system — where ordinary training pressures, not deliberate design, pull points toward fixed destinations in latent space. The cleanest answer in the corpus is that if you take a trained autoencoder and iterate its encode-decode map — feed the output back in, again and again — the trajectories converge to fixed points, and those fixed points are attractors that nobody explicitly built Do autoencoders learn hidden attractors in latent space?. The 'contractive bias' isn't a special loss term; it falls out of mundane choices like weight decay, initialization, and data augmentation, all of which gently shrink the space so nearby inputs flow to shared basins. The model behaves like a vector field even though it was only ever trained to reconstruct.

The interesting part is what determines *where* those attractors sit. The same note ties their character to the memorization-versus-generalization spectrum: an overfit model carves narrow basins around stored examples, while a generalizing one settles into smoother, broader ones. That maps neatly onto a separate finding that representational density itself is *learned* — networks build dense, confident activations for familiar data and fall back to sparse ones for unfamiliar inputs, purely through exposure during pretraining Is representational sparsity learned or intrinsic to neural networks?. Read together, an attractor is less a geometric accident than a fossil of what the model saw a lot of: familiarity sculpts the wells, and the iterated map just rolls downhill into them.

This connects to a surprising fact about what latent space looks like when you probe it. Networks don't store structure as a featureless blob — they spontaneously organize it. LLM activations encode syntactic type and direction in something like polar coordinates, using both distance and angle, without ever being told to How do language models encode syntactic relations geometrically?. The lesson that travels back to autoencoders: contractive training doesn't just compress, it imposes geometry, and stable attractors are one signature of that self-organized structure. Stability and structure are two faces of the same learned latent field.

There's also a stability-as-adaptation thread worth following. When LLMs hit out-of-distribution inputs, their hidden states *sparsify* — and this acts as a selective filter that holds performance together rather than a breakdown Do language models sparsify their activations under difficult tasks?. So 'stable attractor' and 'adaptive collapse to a sparse code' may be describing the same instinct from different angles: the network defaulting to a safe, low-dimensional resting state when it's unsure. Attractors aren't only memory wells; they can be where a model goes to stay reliable.

For a cross-domain mirror, look at reinforcement learning. RL post-training reliably collapses a model's many pretraining output formats down to one dominant format within the first epoch — an attractor in behavior space rather than latent space, where the winner is set by model scale, not quality Does RL training collapse format diversity in pretrained models?. The shared shape is convergence-without-intent: a training process, optimizing for something else entirely, quietly funnels a system toward a small set of stable outcomes. If you want a constructive counterpoint — treating the latent dimension as something you scale and steer rather than something that collapses on you — latent-thought models couple fast local and slow global learning to make latent size its own scaling axis Can latent thought vectors scale language models beyond parameters?.

Sources 6 notes

Do autoencoders learn hidden attractors in latent space?

Iterating an autoencoder's encode-decode map reveals convergent trajectories with attractor points that emerge from training-induced contractive biases. These attractors arise naturally from initialization schemes, weight decay, and data augmentation—without explicit design—and their nature reflects the memorization-versus-generalization spectrum of the training regime.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

How do language models encode syntactic relations geometrically?

The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Show all 6 sources

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs2.61 match · arxiv ↗
Navigating the Latent Space Dynamics of Neural Models1.69 match · arxiv ↗
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control1.68 match · arxiv ↗
Semantic Structure in Large Language Model Embeddings1.67 match · arxiv ↗
Bigger is not always better: The importance of human-scale language modeling for psycholinguistics1.67 match · arxiv ↗
Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs1.60 match · arxiv ↗
Scalable Language Models with Posterior Inference of Latent Thought Vectors0.93 match · arxiv ↗
A polar coordinate system represents syntax in large language models0.92 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a mechanistic AI researcher re-testing claims about latent-space dynamics in neural networks. The question: How do encode-decode contractive biases create stable attractors in latent space — and are those attractors truly *stable* under recent model scales, training regimes, and evaluation methods, or have they been dissolved or redirected?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026 and include:
- Iterated encode-decode maps converge to fixed-point attractors without explicit design; the 'bias' emerges from weight decay, initialization, and augmentation (~2025).
- Attractor location and basin width are sculpted by memorization versus generalization: overfit models carve narrow basins; generalizing ones settle smoother, broader wells (~2025).
- LLM activations spontaneously organize into polar-coordinate structure (type and direction), suggesting latent fields impose geometry independent of task (~2024-12).
- Under out-of-distribution shift, hidden states sparsify adaptively, acting as a selective filter that defaults to a low-dimensional 'safe' resting state (~2026-03).
- RL post-training converges on a single dominant pretraining format within one epoch — convergence-without-intent in behavior space (~2025-04).

Anchor papers (verify; mind their dates):
- arXiv:2412.05571 (2024-12): Polar coordinate system in LLM activations
- arXiv:2505.22785 (2025-05): Latent space dynamics of neural models
- arXiv:2504.07912 (2025-04): RL post-training attractor collapse
- arXiv:2603.03415 (2026-03): OOD sparsification as adaptive mechanism

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every claim above, ask whether scale (e.g., frontier models post-2025), training methods (e.g., DPO, iterative preference tuning), or orchestration (long-context caching, multi-agent loops) have relaxed or overturned it. Separate the durable insight (e.g., "networks default to low-dimensional resting states") from the perishable boundary condition (e.g., "this happens only in small autoencoders"). State plainly where constraints still hold and where they've been resolved.
(2) **Surface CONTRADICTING or SUPERSEDING work** from the last ~6 months that suggests attractors are unstable, context-dependent, or actively harmful under scaling.
(3) **Propose 2 research questions** that assume the attractor regime may have shifted: e.g., "Do frontier-scale models exhibit multiple competing attractors that trade off under task switching?" or "Can deliberate latent-space steering methods *prevent* unwanted attractor convergence in reasoning tasks?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Ordinary training choices silently turn an autoencoder into a system with hidden gravity wells that nobody explicitly built.

Related lines of inquiry

Sources 6 notes

Papers this line draws on 8