INQUIRING LINE

Why do transformer models still miss implicit discourse relations in anxiety detection?

This explores why transformers struggle to read the *relationships between statements* — the causal, cross-sentence reasoning that signals anxiety — rather than just the emotional words on the surface.


This explores why transformers struggle to read the *relationships between statements* — the causal, cross-sentence reasoning that signals anxiety — rather than just the emotional words on the surface. The corpus points to a clear culprit: anxiety doesn't live in vocabulary, it lives in discourse. Anxious thinking shows up as overgeneralization built through chains of causal reasoning across statements — "because X happened, Y will fail, which means Z is hopeless." Discourse-level causal features predict anxiety more accurately than any single word, and the best results come from a dual model that reads both levels at once Why do discourse patterns predict anxiety better than single words?. A model tuned to spot worried-sounding words will keep missing the inter-statement logic that actually does the diagnostic work.

Why is that cross-statement layer so hard for transformers to pick up? Part of the answer is what training rewards. Models are optimized to predict information, not to track the implicit relational and structural work that holds a stretch of discourse together — which is exactly why they don't naturally develop conversation-maintenance moves like reference repair or topic hand-off Why don't language models develop conversation maintenance skills?. Implicit discourse relations belong to that same under-rewarded category: nobody labels them, and the training signal doesn't push the model to encode them, so they stay latent.

There's also a deeper representational story. Transformers tend to carry knowledge as continuous flow through the residual stream rather than as stored, addressable facts — knowledge that's contextual and inseparable from the act of generating Do transformer models store knowledge or generate it continuously?. And when context conflicts with strong priors learned in training, the priors win: models generate outputs inconsistent with what's actually in front of them, and prompting alone can't fix it Why do language models ignore information in their context?. An implicit discourse relation *is* a piece of context that has to be integrated against the model's defaults — so the same failure mode that makes models ignore their context shows up as missed inter-statement reasoning.

The clinical-dialogue notes sharpen the stakes. When transformers do try to read between the lines, they tend to over-read — injecting emotional interpretations the user never expressed Do language models add feelings users never actually expressed? — or they default to problem-solving the moment someone discloses emotion, a hallmark of low-quality therapy driven by RLHF's helpfulness bias Do LLM therapists respond to emotions like low-quality human therapists?. So it's not simply that models under-detect discourse signal; they're miscalibrated about it in both directions, hallucinating relations that aren't there while missing the ones that are.

The quietly surprising part: the discourse signal may already be inside the network, just not surfaced. Transformers compute correct intermediate reasoning in early layers and then overwrite it to satisfy output formatting — the real computation is recoverable from lower-ranked predictions even when the final tokens are filler Do transformers hide reasoning before producing filler tokens?. That reframes the whole question: the fix for implicit discourse relations might be less about adding capability and more about not discarding the cross-statement reasoning the model already performs — through dual-level architectures Why do discourse patterns predict anxiety better than single words? or uncertainty-aware objectives that let a model abstain instead of guessing when the relation is ambiguous Can models learn to abstain when uncertain about predictions?.


Sources 8 notes

Why do discourse patterns predict anxiety better than single words?

Causal explanations across statements—not individual words—are the strongest predictor of anxiety because anxious thinking involves overgeneralization through inter-statement reasoning. A dual model combining both representation levels outperforms either alone.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do language models add feelings users never actually expressed?

Therapists reviewing GPT-4 in the CaiTI system found it "reads into" user feelings rather than responding objectively. Task decomposition across specialized models (Reasoner/Guide/Validator) reduces but does not eliminate this interpretation bias.

Do LLM therapists respond to emotions like low-quality human therapists?

Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about transformer performance on implicit discourse relations in anxiety detection. The question: *Why do transformers still miss implicit discourse relations in anxiety detection?* remains open; the constraints below may have shifted.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026; treat as perishable:
- Discourse-level causal features predict anxiety *significantly more accurately* than lexical features alone; dual-level models outperform single-pathway designs (~2024).
- Transformers under-encode implicit discourse relations because training rewards information prediction, not relational tracking; conversation-maintenance moves (reference repair, topic hand-off) remain latent (~2023–2024).
- Transformers store knowledge as contextual flow in residual streams, not as addressable facts; when priors conflict with context, priors override — context integration fails (~2024).
- LLMs over-interpolate user emotion (hallucinating relations) while defaulting to problem-solving on disclosure, a misalignment driven by RLHF helpfulness bias (~2024).
- Cross-statement reasoning computations exist in early transformer layers but are overwritten during output generation; recovery from lower-ranked predictions suggests the signal is *already present* (~2024–2025).

Anchor papers (verify; mind their dates):
- arXiv:2401.00820 (2024-01): Behavioral assessment of LLM therapists; establishes misalignment on emotion handling.
- arXiv:2412.04537 (2024-12): Hidden computations in chain-of-thought; demonstrates layer-wise reasoning recovery.
- arXiv:2507.21083 (2025-06): Emotional framing in ChatGPT; probes tone sensitivity and drift.
- arXiv:2507.03112 (2025-07): RLVER framework; tests verifiable emotion rewards as training fix.

Your task:
(1) RE-TEST EACH CONSTRAINT. For discourse-level features, priors-override-context, and layer-wise reasoning recovery: has newer training (RLVER, posterior inference of latent thought, verifiable reward signals) or architectural changes (memory-augmented discourse tracking, multi-layer routing, uncertainty-aware routing) since relaxed the core limitation? Separate the durable question (implicit relations as a *structural* challenge across discourse domains) from perishable assumptions (that single-pass generation cannot surface them).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months. Has any paper shown transformers *do* track discourse relations under specific conditions (e.g., fine-tuning on repair, multi-turn memory, explicit discourse labels), or that the problem is *not* missing computation but output routing?
(3) Propose 2 research questions that assume the regime has moved: (a) If discourse relations are recoverable in hidden layers, what training objective would *surface* them without degrading information prediction? (b) Can a verifiable emotion reward (as in RLVER) extend to causal discourse chains, and does it close the gap between layer-wise reasoning and final output?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines