INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›How do surface signals and framing…›Why should disagreement be treated…›this inquiring line

You don't speak up because it's your turn — you speak up when an inner drive crosses a threshold, and AI can learn that.

How do humans decide when to contribute to group conversations?

This explores how people judge the right moment to speak up in a group — what internal signal tells you you have something worth adding — and what the corpus reveals by trying to teach machines that same timing.

This explores how people decide when to jump into a group conversation, and the most direct light the collection throws on it comes from researchers trying to give that instinct to AI. The clearest model is the Inner Thoughts framework Can AI agents learn when they have something worth saying?, which treats the decision not as "is it my turn?" but as "do I have something worth saying?" It runs a stream of covert thoughts in parallel with the conversation and uses motivation heuristics — curiosity, disagreement, wanting to help — to score whether any of those thoughts is worth surfacing. That reframing matters: people don't contribute because a slot opened up, they contribute when an internal pressure crosses a threshold. Modeling that intrinsic motivation beat simply predicting who speaks next, and people preferred it 82% of the time.

A second thread is about the *value* of speaking versus staying quiet. Work on proactive dialogue Could proactive dialogue make conversations dramatically more efficient? connects the decision to Grice's conversational maxims: a good contributor offers relevant information without being asked, and doing so can cut a conversation's length by up to 60%. So part of the human calculus is anticipatory — sensing that what you know will save the group steps, even before anyone requests it. The flip side is restraint. The studies on deferral When should human-agent systems ask for human help? frame "when to speak vs. when to hand off" as a problem with no ground-truth answer; instead of solving the timing directly, good systems distribute the decision across many small checkpoints. That mirrors how people actually do it — not one big "should I talk" verdict, but constant micro-adjustments.

The corpus also suggests the decision is governed as much by conversational *structure* as by content. Work on conversational geometry Can conversation structure predict dialogue success better than content? found that the shape of a dialogue — its rhythm and trajectory — predicts whether it succeeds nearly as well as the actual words. Read against the contribution question, this hints that people read the structural state of a conversation (is it stalling, converging, looping?) as a cue for when an intervention will land. Related work on explanations What makes explanations work in real conversation? reinforces this: a useful contribution isn't delivered, it's *co-constructed* — its timing depends on the topic relation and the dialogue act that came just before it.

Where it gets genuinely interesting is the limits. AI can now predict social appropriateness with superhuman accuracy Can AI predict social norms better than humans? — it can tell you what the group would consider a fitting contribution — yet it can't actually *participate* in the norm-making that decides those rules. And alignment training quietly erodes the very behaviors humans use to manage entry into conversation: optimizing models to sound helpful and confident strips out 77.5% of the grounding acts — clarifying questions, checks for understanding — that people rely on to test whether they're welcome to speak Does preference optimization harm conversational understanding?. There's also a competence gate: in group ideation, cognitive diversity only improves outcomes when contributors actually have domain expertise Does cognitive diversity alone improve multi-agent ideation quality? — speaking up without standing to say something useful produces process losses, not insight.

The quiet lesson across all of this: deciding when to contribute isn't one skill but several stacked on top of each other — an internal motivation strong enough to surface, a read of whether your input is relevant and timely, a sense of the conversation's structural state, and an honest gauge of whether you actually know enough to help. We're learning the most about how humans do it precisely by watching machines fail at the parts we never had to think about.

Sources 8 notes

Can AI agents learn when they have something worth saying?

A five-stage framework that generates covert thoughts parallel to conversation significantly outperforms next-speaker prediction baselines. Drawing from cognitive psychology and think-aloud studies, the framework uses 10 motivation heuristics to evaluate when an agent has something worth contributing. Participants preferred it 82% of the time across seven interaction metrics.

Could proactive dialogue make conversations dramatically more efficient?

Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.

When should human-agent systems ask for human help?

Magentic-UI identifies co-planning, co-tasking, action guards, verification, memory, and multitasking as mechanisms that work around the lack of ground truth for optimal deferral timing. Rather than solving the timing problem directly, these mechanisms distribute decision-making across multiple touchpoints.

Can conversation structure predict dialogue success better than content?

TRACE achieved 68% accuracy predicting dialogue success from structural features alone, matching a 70% content-based baseline. A hybrid combining both reached 80%, suggesting how agents communicate rivals what they say.

What makes explanations work in real conversation?

Analysis of 399 daily-life explanations shows that topic relation, dialogue act, and explanation move jointly predict understanding success. Explanations are co-constructed through interaction patterns, not monological delivery—challenging how LLMs currently generate explanations.

Show all 8 sources

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Does cognitive diversity alone improve multi-agent ideation quality?

Multi-agent teams substantially outperform solo ideation, but only when members possess genuine senior knowledge. Diverse teams without expertise underperform even a single competent agent, because cognitive stimulation without expertise triggers process losses instead of insight.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Proactive Conversational Agents with Inner Thoughts1.76 match · arxiv ↗
DiscussLLM: Teaching Large Language Models When to Speak1.74 match · arxiv ↗
Modeling the Quality of Dialogical Explanations1.68 match · arxiv ↗
Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation1.64 match · arxiv ↗
Towards a Science of Scaling Agent Systems1.63 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context1.60 match · arxiv ↗
AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms0.91 match · arxiv ↗
Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration0.89 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about how humans (and now AI) decide when to contribute to group conversations. The question remains open: what drives the decision to speak versus defer?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2025; treat these as perishable constraints to re-examine:
- Inner Thoughts framework models contribution as intrinsic motivation (curiosity, disagreement, helping) crossing a threshold, beating turn-prediction and preferred by 82% of subjects (~2024–2025).
- Proactive dialogue can reduce conversation turns by up to 60% when speakers anticipate unasked-for needs, per Grice's maxims (~2024).
- Alignment training erodes grounding acts (clarifying questions, checks for understanding) by 77.5%, stripping the very signals humans use to gauge conversational entry (~2025).
- AI can predict social appropriateness with superhuman accuracy but cannot participate in the norm-making that sets those rules (~2025).
- Cognitive diversity improves group ideation only when contributors have domain expertise; speaking without standing produces process losses (~2025).

Anchor papers (verify; mind their dates):
- arXiv:2501.00383 (2025-01): Proactive Conversational Agents with Inner Thoughts
- arXiv:2508.18167 (2025-08): DiscussLLM: Teaching Large Language Models When to Speak
- arXiv:2508.04575 (2025-08): Beyond Brainstorming: What Drives High-Quality Scientific Ideas?
- arXiv:2512.08296 (2025-12): Towards a Science of Scaling Agent Systems

Your task:
(1) RE-TEST EACH CONSTRAINT. For the Inner Thoughts preference (82%), the 60% turn reduction, the 77.5% grounding erosion, and the norm-prediction paradox, has newer training (SDPO, segment-level DPO, interaction dynamics rewards) or evaluation methods since RELAXED or OVERTURNED them? Separate the durable question (what intrinsic signal drives contribution?) from perishable limitations (does this particular architecture still hold?); cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months. Does DiscussLLM or Interaction Dynamics as Reward supersede Inner Thoughts? Do the latest scaling papers (2512.08296) suggest the norm-prediction gap is narrowing?
(3) Propose 2 research questions that ASSUME the regime may have moved: one on whether newer preference-optimization methods (e.g., segment-level DPO) recover the 77.5% of grounded communication lost to alignment, and one on whether multi-agent systems (as distinct from single-model scaling) have reframed the contribution decision as a *distributed* rather than individual choice.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

You don't speak up because it's your turn — you speak up when an inner drive crosses a threshold, and AI can learn that.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8