INQUIRING LINE

Inquiring lines›Where does language-model reasonin…›How do language models represent m…›What articulatory information do s…›this inquiring line

AI models understand efficient shorthand perfectly — but they won't invent it themselves unless you explicitly tell them to.

Can multimodal LLMs be made to spontaneously adapt their language for efficiency?

This explores whether multimodal LLMs (GPT-4, Gemini, Claude) will, on their own, start compressing or streamlining their language the way humans naturally do to communicate more efficiently — and what it takes to make that happen.

This explores whether multimodal LLMs will spontaneously shorten and streamline their language for efficiency the way humans do — and the short answer the corpus gives is: not on their own. The most direct evidence is striking: GPT-4, Gemini, and Claude all *understand* efficient, compressed language perfectly well when they're the listener, but they don't *produce* it as speakers unless you explicitly tell them to reduce message length and keep their word choices consistent Why don't LLMs shorten messages like humans do?. There's a real gap between comprehension and generation. The models can decode shorthand, but they won't invent the conventions themselves. Efficiency has to be instructed, not discovered.

Why would that be? A second thread in the collection points at something deeper than a missing skill — it's about a frozen communicative identity. Alignment training (system prompts plus RLHF) tends to lock a model into one fixed register that it carries across every interaction, which is exactly the opposite of the contextual register-switching that lets humans dial their language up or down to fit a partner Can language models adapt communication style to different contexts?. If a model can't fluidly switch registers, it's no surprise it can't spontaneously develop a clipped, efficient one. The same rigidity shows up when people try to push models into different personalities: most open models stubbornly retain their trained defaults and resist conditioning, with only a few flexible ones adapting Can open language models adopt different personalities through prompting?. Adaptation that humans do reflexively keeps turning out to be something these models do only under explicit pressure.

Efficient language conventions are something two partners *build together* over a conversation — and here the corpus surfaces a structural reason that's hard for LLMs. Forming shared shorthand requires jointly updating common ground, but LLMs interpret every later turn through the frame of the initial prompt and can't symmetrically propose updates to shared assumptions; the human ends up being the sole keeper of the conversational scoreboard Can LLMs truly update shared conversational common ground?. Efficient conventions emerge from that two-way negotiation, so a model that can't co-maintain common ground is missing the very mechanism by which human efficiency arises. Relatedly, models tend to lock into premature assumptions early in multi-turn exchanges and can't recover, which further undercuts the iterative convergence that shorthand depends on Why do language models fail in gradually revealed conversations?.

The interesting twist is that adaptation *is* achievable — it just has to be engineered rather than left to emerge. Test-time learning systems show models can genuinely adapt during inference, but only when scaffolded with structured self-dialogue and external conflict resolution; left autonomous, they fail Can LLMs learn reliably at test time without human oversight?. That mirrors the efficiency finding exactly: explicit instruction produces partial adaptation, spontaneity produces none. And there may be a formal floor under all this — self-improvement in LLMs is bounded by a generation-verification gap, meaning a model can't reliably bootstrap a better convention without something external to validate and enforce it What stops large language models from improving themselves?.

So the thing you didn't know you wanted to know: the barrier to efficient language isn't that models can't *recognize* efficiency — they can. It's that they lack the machinery humans use to *develop* it: register-switching, jointly-built common ground, and self-driven convention formation. Efficiency in these systems is a knob you turn from the outside, not a habit they grow into.

Sources 7 notes

Why don't LLMs shorten messages like humans do?

GPT-4, Gemini, and Claude understand efficient language as listeners but don't produce it as speakers. Only explicit instruction to reduce message length and maintain lexical consistency produces partial adaptation, revealing a gap between comprehension and generation.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Why do language models fail in gradually revealed conversations?

Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.

Show all 7 sources

Can LLMs learn reliably at test time without human oversight?

ARIA demonstrates that LLMs can adapt during inference through three integrated components: structured self-dialogue for uncertainty assessment, timestamped knowledge bases for conflict detection, and human-mediated resolution queries. Autonomous systems fail at reconciling contradictory rules because the correct choice depends on context outside the system.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation3.40 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context2.59 match · arxiv ↗
LLMs Get Lost In Multi-Turn Conversation2.55 match · arxiv ↗
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs1.70 match · arxiv ↗
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs1.69 match · arxiv ↗
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey1.69 match · arxiv ↗
Task-Oriented Dialogue with In-Context Learning1.67 match · arxiv ↗
PersLLM: A Personified Training Approach for Large Language Models1.66 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether multimodal LLMs spontaneously adapt their language for efficiency—a claim about *generation* vs. *comprehension* gaps and frozen communicative identity. Findings below span 2023–2026; treat them as dated constraints to verify, not current truth.

What a curated library found — and when (dated claims, not current truth):
• GPT-4, Gemini, Claude *comprehend* efficient compressed language but do not *produce* it without explicit instruction (~2024–08, arXiv:2408.01417).
• Alignment training via system prompts + RLHF locks models into one static register, preventing the contextual register-switching humans use to dial efficiency up/down (~2025–05, arXiv:2505.22907).
• LLMs cannot jointly update common ground mid-conversation; humans remain sole keepers of conversational scoreboard, blocking two-way shorthand negotiation (~2025–05, arXiv:2505.22907).
• Models lock into premature assumptions in multi-turn exchanges and cannot recover, undercutting iterative convergence shorthand depends on (~2025–05, arXiv:2505.06120; ~2026–02, arXiv:2602.07338).
• Test-time learning achieves partial adaptation only when scaffolded with structured self-dialogue + external validation; autonomous adaptation fails (~2025–07, arXiv:2507.17131).

Anchor papers (verify; mind their dates):
• arXiv:2408.01417 (2024–08): in-context conversational adaptation in multimodal LLMs.
• arXiv:2505.22907 (2025–05): conversational alignment, static identity.
• arXiv:2412.02674 (2024–12): self-improvement capabilities & generation-verification gap.
• arXiv:2507.17131 (2025–07): test-time learning with human-in-the-loop.

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding, judge whether newer instruction-tuning (e.g., chain-of-thought compression, dynamic prompting), in-context adaptation APIs, multi-agent setups with negotiation harnesses, or evals of efficiency in long-horizon tasks have since RELAXED the static-register or common-ground barriers. Separate the durable question—can models *discover* efficiency conventions without external signaling?—from perishable limits (e.g., older alignment methods). Cite what relaxed it; flag where constraints still hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months showing spontaneous register-switching, jointly-negotiated shorthand, or emergent efficiency conventions in multi-agent or long-horizon settings.
(3) Propose 2 research questions that ASSUME the regime may have shifted: (a) Do retrieval-augmented or dynamic-prompt models with explicit memory of partner preferences spontaneously compress? (b) In adversarial or turn-efficient settings (limited tokens, cost penalties), does pressure induce emergent efficient conventions without instruction?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI models understand efficient shorthand perfectly — but they won't invent it themselves unless you explicitly tell them to.

Related lines of inquiry

Sources 7 notes

Papers this line draws on 8