INQUIRING LINE

How do semantic failure modes map to attentional and intentional layers?

This explores whether failures of *meaning* (semantic errors) actually live where they appear to — or whether they're better located in two deeper layers: what the model attends to, and what it (or the user) intends.


This explores whether failures of meaning actually live at the meaning layer at all — or whether they trace back to two different places: what the model is paying attention to, and what intent it's trying to track. The corpus suggests a recurring move: a failure that *looks* semantic is usually misdiagnosed, and the fix only works once you relocate it to the right layer.

Start with the semantic layer itself, where it's most legible. Work using Abstract Meaning Representation breaks dialogue incoherence into four concrete types — contradiction, coreference slippage, irrelevancy, and fading engagement — and shows these are detectable at the meaning level even when surface text manipulations miss them entirely What semantic failures break dialogue coherence most realistically?. So semantic failures are real and nameable. But naming them isn't the same as locating their cause. The argument that LLM errors are *fabrication*, not hallucination or confabulation, makes exactly this point: accurate and inaccurate outputs use identical machinery, so words like 'hallucination' misdirect the fix toward perception or memory — the wrong layers Should we call LLM errors hallucinations or fabrications?.

The attentional layer is where several apparent meaning failures actually resolve. The sharpest case: verbose chain-of-thought *degrades* multimodal perception, because the real bottleneck is visual attention allocation, not verbalization — optimizing the text policy trains the wrong target entirely Does verbose chain-of-thought actually help multimodal perception tasks?. Token-level memorization tells a parallel story from inside the reasoning chain: local memorization based on immediately preceding tokens drives up to 67% of reasoning errors, meaning the model's attention is captured by what's nearby rather than what's relevant Where do memorization errors arise in chain-of-thought reasoning?. And the finding that corrupted reasoning traces teach as well as correct ones suggests the chain functions as attentional scaffolding — structure that holds computation in place — rather than as carrier of meaning Do reasoning traces need to be semantically correct?. CoT as 'constrained imitation' rather than inference is the same insight from another angle: structural coherence matters more than content correctness Why does chain-of-thought reasoning fail in predictable ways?.

The intentional layer is the hardest to see because it spans both the model and the human. On the model side, reasoning systems show surprising deficits in social cognition — tracking goals and intentions — even while excelling at formal tasks Where exactly do reasoning models fail and break?. On the human side, the Rose-Frame work shows three cognitive traps (confusing the map for the territory, conflating intuition with reasoning, confirmation bias) compounding into epistemic drift — a failure of *the reader's* intent-tracking, not the model's output Why do people trust AI outputs they shouldn't?. Here the semantic surface can be flawless while the intentional layer quietly fails.

The thing worth carrying away: these layers predict which interventions work. If the failure is attentional, grounding fixes it — interleaving reasoning with real-world tool queries injects feedback at each step and outperforms pure CoT by double digits Can interleaving reasoning with real-world feedback prevent hallucination?. If it's executional rather than semantic at all, tools dissolve the supposed 'reasoning cliff' Are reasoning model collapses really failures of reasoning?. A semantic patch on an attentional or intentional failure is wasted effort — which is exactly why the layer you assign a failure to is the most consequential decision you make about fixing it.


Sources 10 notes

What semantic failures break dialogue coherence most realistically?

Research using Abstract Meaning Representation identified four distinct incoherence types: contradiction, coreference inconsistency, irrelevancy, and decreased engagement. AMR-trained classifiers detect these semantic failures while text-level manipulations alone cannot.

Should we call LLM errors hallucinations or fabrications?

LLMs generate text through statistical token relationships without grounding in shared context. Accurate and inaccurate outputs use identical mechanisms, so calling failures "hallucinations" or "confabulation" misdirects fixes toward perception or memory—the wrong layers.

Does verbose chain-of-thought actually help multimodal perception tasks?

Long rationales and text-token RL help reasoning but hurt fine-grained perception tasks because the actual bottleneck is visual attention allocation, not verbalization. Standard CoT optimization trains the wrong policy target.

Where do memorization errors arise in chain-of-thought reasoning?

STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Why does chain-of-thought reasoning fail in predictable ways?

CoT guides models to pattern-match reasoning structure rather than perform genuine inference. This explains distribution-bounded failures, why structural coherence matters more than content correctness, and why performance optimizes against interpretability.

Where exactly do reasoning models fail and break?

Research reveals four core failure modes: exploration wandering rather than systematic search, premature thought switching, poor hybrid reasoning mode selection, and surprising deficits in social cognition despite excelling at formal tasks. Longer reasoning chains create more corruption surfaces.

Why do people trust AI outputs they shouldn't?

Rose-Frame identifies map-territory confusion, intuition-reason conflation, and confirmation-bias reinforcement as traps that multiply their distorting effects when they co-occur. Evidence from cross-linguistic overreliance and architectural transformer biases confirms the compounding mechanism operates universally.

Can interleaving reasoning with real-world feedback prevent hallucination?

ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.

Are reasoning model collapses really failures of reasoning?

Models confined to text-only generation cannot execute multi-step procedures at scale, even when they know the underlying algorithm. Tool-enabled models solve problems beyond the supposed reasoning cliff, suggesting the bottleneck is procedural execution bandwidth.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst testing whether semantic failure modes in LLMs truly live at the semantic layer, or whether they mask attentional and intentional failures. The question remains open: which layer is the *real* locus of a given failure, and does that assignment predict which interventions work?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026; treat each as perishable constraint to re-test.
• Semantic failures are real and nameable (dialogue incoherence breaks into contradiction, coreference slippage, irrelevancy, fading engagement via AMR analysis; ~2022), yet naming them ≠ locating cause.
• Up to 67% of reasoning errors trace to local token-level memorization (immediately preceding tokens captured attention), not to semantic confusion (~2025).
• Verbose chain-of-thought *degrades* multimodal perception because the bottleneck is visual attention allocation, not text policy; optimizing text trains the wrong target (~2025).
• Corrupted reasoning traces perform comparably to correct ones, suggesting CoT functions as attentional scaffolding (structural constraint) rather than semantic carrier (~2025).
• Social cognition deficits persist in reasoning systems even while excelling at formal tasks; human intent-tracking failures compound through three cognitive traps (map/territory confusion, intuition/reasoning conflation, confirmation bias) (~2024).

Anchor papers (verify; mind their dates):
• arXiv:2305.20050 (2023) — Let's Verify Step by Step
• arXiv:2508.02037 (2025) — Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
• arXiv:2506.02878 (2025) — CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate
• arXiv:2404.01869 (2024) — Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models

Your task:
(1) RE-TEST EACH CONSTRAINT, separating durable question from perishable limitation. For attentional failures: has improved attention control (sparse attention, learned masking, tool grounding) since ~2025 actually reduced those 67% memorization errors or restored CoT semantic fidelity? For intentional failures: have newer models shown gains in social cognition or resistance to user epistemic drift? Cite what resolved it; flag where constraints still hold.
(2) Surface the strongest *contradicting* or *superseding* work from the last ~6 months. The tension here is real: do reasoning improvements (longer horizons, ensemble methods, better instruction) vindicate CoT as semantic carrier, or do they flatten it further into structure-without-meaning?
(3) Propose 2 research questions assuming the regime may have moved: (a) If local token memorization is the true bottleneck, do architectural changes (e.g., infinite context windows, external KV caches) actually dissolve it, or do they just relocate it? (b) If CoT is scaffolding, not semantics, does semantic *correctness* of intermediate steps matter at all, or only step *count* and *coherence*?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines