INQUIRING LINE

Inquiring lines›How do language models construct a…›How does AI persuasion undermine h…›What limits mechanistic interpreta…›this inquiring line

The 'closeness' scores AI models compute between concepts aren't just data — they're also fingerprints of arbitrary training choices.

What makes regularization an implicit factor in embedding geometry?

This explores how the regularization choices made during training—weight decay, latent penalties, structural constraints—quietly shape the geometry of learned embeddings, even though geometry feels like it should reflect 'the data' rather than a tuning knob.

This explores how regularization—the penalties and constraints we add during training to keep models well-behaved—ends up silently authoring the geometry of embedding spaces, rather than that geometry simply emerging from the data. The sharpest version of the claim is that a metric everyone treats as objective isn't. When you fit a regularized linear model with a closed-form solution, the cosine similarities between its learned embeddings turn out not to be unique: they depend on the regularization choice, not on any stable semantic structure, so the same data can yield different 'similarities' depending on a knob you set Does cosine similarity actually measure embedding similarity?. The geometry you read off the model is partly a fingerprint of how you regularized it.

Why does this happen? Because regularization is a thumb on the scale of which solutions are reachable. Embedding spaces are wildly underdetermined—many configurations fit the data equally well—and the penalty term picks the winner. You can watch this concretely in autoencoders: iterating their encode-decode map reveals attractor points and convergent trajectories that nobody designed, arising directly from weight decay, initialization, and data augmentation Do autoencoders learn hidden attractors in latent space?. The contractive bias that pulls points toward attractors *is* the regularization, made visible as shape. Strengthen or weaken it and the basins move.

The lever cuts both ways, which is why it's worth understanding rather than fearing. A single Gaussian-latent regularizer is enough to stop a JEPA from collapsing all its representations into a useless point—replacing six fiddly hyperparameters with one principled penalty that holds the geometry open Can a single regularizer prevent JEPA representation collapse?. And structural constraints, a cousin of regularization, can be the whole story: ESLER's zero-diagonal rule (items can't predict themselves) forces prediction through inter-item relationships and beats deep models—evidence that the imposed bias matters more than raw capacity Can a linear model beat deep collaborative filtering?. Forcing sparsity onto weights likewise reshapes geometry into clean, modular, human-readable circuits that wouldn't form otherwise Can sparse weight training make neural networks interpretable by design?.

The unsettling corollary is that the same regularizing pressures decide whether a model memorizes or generalizes—the attractor work frames its emergent geometry as sitting exactly on that spectrum Do autoencoders learn hidden attractors in latent space?—and they can hide as easily as they help. Models trained with ordinary SGD can carry every linearly-decodable feature a task needs while their internal organization is quietly fractured, a brittleness invisible to accuracy metrics but exposed by perturbation Can models be smart without organized internal structure?. Even apparently 'intrinsic' properties like activation density turn out to be trained in, not given: networks learn dense codes for familiar data and fall back to sparse ones for the unfamiliar Is representational sparsity learned or intrinsic to neural networks?.

What you didn't know you wanted to know: the structure people celebrate as 'emergent'—polar coordinates encoding syntax, eigenvectors that recover the WordNet hierarchy coarse-to-fine How do language models encode syntactic relations geometrically? Do embedding eigenvectors organize taxonomy from coarse to fine?—lives in a space whose ruler was set by regularization. The geometry is real, but the coordinate frame you measure it in is a choice, which is why a similarity score can be both meaningful and unstable at the same time.

Sources 9 notes

Regularized linear models with closed-form solutions show that cosine similarities between embeddings are not unique and depend on regularization choices made during training, not on actual semantic structure. This makes cosine scores unstable and potentially meaningless.

Do autoencoders learn hidden attractors in latent space?

Iterating an autoencoder's encode-decode map reveals convergent trajectories with attractor points that emerge from training-induced contractive biases. These attractors arise naturally from initialization schemes, weight decay, and data augmentation—without explicit design—and their nature reflects the memorization-versus-generalization spectrum of the training regime.

Can a single regularizer prevent JEPA representation collapse?

LeWorldModel trains a JEPA end-to-end using only next-embedding prediction and a Gaussian-latent regularizer, reducing tunable hyperparameters from six to one. The model achieves competitive control performance and 48× faster planning than foundation-model world models on a single GPU.

Can a linear model beat deep collaborative filtering?

ESLER, a single-layer linear autoencoder constrained so items cannot predict themselves, outperforms most deep CF models. The constraint forces prediction through item relationships, and negative weights encoding anti-affinity prove essential—structural bias matters more than model capacity.

Can sparse weight training make neural networks interpretable by design?

Training transformers with sparse weights creates compact, human-interpretable circuits where neurons correspond to simple concepts with clear connections. Ablation studies confirm these circuits are necessary and sufficient for task performance, though scaling beyond tens of millions of parameters while maintaining interpretability remains unsolved.

Show all 9 sources

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

How do language models encode syntactic relations geometrically?

The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.

Do embedding eigenvectors organize taxonomy from coarse to fine?

Leading eigenvectors of embedding Gram matrices separate broad taxonomic branches first, then progressively finer sub-branches—a coarse-to-fine spectral order that tracks the WordNet hypernym tree level by level, confirming predictions from co-occurrence statistics.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about regularization's role in embedding geometry. The question remains open: does regularization author embedding space structure, or merely constrain solution selection within a fixed regime?

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2026; treat each as perishable until re-grounded.
• Cosine similarity of embeddings is not intrinsic to data; it depends on regularization choice, making 'similarity' unstable (2024).
• A single Gaussian latent regularizer prevents representation collapse in JEPAs, replacing six hyperparameters with one principled penalty (2026).
• Structural constraints (e.g., zero-diagonal in ESLER) force geometry through inter-item relationships and outperform deep models on collaborative filtering (2019).
• Weight sparsity produces interpretable, modular circuits that don't emerge without the imposed bias (2026).
• Activation density (sparse vs. dense codes) is learned via training-data familiarity, not intrinsic to architecture (2026).

Anchor papers (verify; mind their dates):
• arXiv:2403.05440 (2024): Is Cosine-Similarity of Embeddings Really About Similarity?
• arXiv:2511.13653 (2026): Weight-sparse transformers have interpretable circuits
• arXiv:2603.03415 (2026): Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
• arXiv:2605.27734 (2026): Learn from your own latents and not from tokens

Your task:
(1) RE-TEST EACH CONSTRAINT. For the claim that regularization *authors* geometry (not merely constrains it): does newer evidence from mechanistic interpretability, sparse autoencoders (SAEs), or multi-scale optimization (e.g., neural collapse, implicit bias theory) show that geometry is instead *intrinsic* to the loss landscape, and regularization merely selects a view? Cite what resolves or upholds the tension.
(2) Surface the strongest work from the last ~6 months that contradicts the "regularization is author" framing—e.g., claims that certain geometric properties (chirality, hierarchies, polar structure) emerge *despite* regularization choices.
(3) Propose 2 research questions that assume the regime has moved: (a) Can you design a regularizer-agnostic measure of embedding geometry robustness? (b) Do scaling laws flatten the dependence on regularization strength, collapsing the space of reachable geometries?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

The 'closeness' scores AI models compute between concepts aren't just data — they're also fingerprints of arbitrary training choices.

Related lines of inquiry

Sources 9 notes

Papers this line draws on 8