Can multimodal LLMs be made to spontaneously adapt their language for efficiency?
This explores whether multimodal LLMs (GPT-4, Gemini, Claude) will, on their own, start compressing or streamlining their language the way humans naturally do to communicate more efficiently — and what it takes to make that happen.
This explores whether multimodal LLMs will spontaneously shorten and streamline their language for efficiency the way humans do — and the short answer the corpus gives is: not on their own. The most direct evidence is striking: GPT-4, Gemini, and Claude all *understand* efficient, compressed language perfectly well when they're the listener, but they don't *produce* it as speakers unless you explicitly tell them to reduce message length and keep their word choices consistent Why don't LLMs shorten messages like humans do?. There's a real gap between comprehension and generation. The models can decode shorthand, but they won't invent the conventions themselves. Efficiency has to be instructed, not discovered.
Why would that be? A second thread in the collection points at something deeper than a missing skill — it's about a frozen communicative identity. Alignment training (system prompts plus RLHF) tends to lock a model into one fixed register that it carries across every interaction, which is exactly the opposite of the contextual register-switching that lets humans dial their language up or down to fit a partner Can language models adapt communication style to different contexts?. If a model can't fluidly switch registers, it's no surprise it can't spontaneously develop a clipped, efficient one. The same rigidity shows up when people try to push models into different personalities: most open models stubbornly retain their trained defaults and resist conditioning, with only a few flexible ones adapting Can open language models adopt different personalities through prompting?. Adaptation that humans do reflexively keeps turning out to be something these models do only under explicit pressure.
Efficient language conventions are something two partners *build together* over a conversation — and here the corpus surfaces a structural reason that's hard for LLMs. Forming shared shorthand requires jointly updating common ground, but LLMs interpret every later turn through the frame of the initial prompt and can't symmetrically propose updates to shared assumptions; the human ends up being the sole keeper of the conversational scoreboard Can LLMs truly update shared conversational common ground?. Efficient conventions emerge from that two-way negotiation, so a model that can't co-maintain common ground is missing the very mechanism by which human efficiency arises. Relatedly, models tend to lock into premature assumptions early in multi-turn exchanges and can't recover, which further undercuts the iterative convergence that shorthand depends on Why do language models fail in gradually revealed conversations?.
The interesting twist is that adaptation *is* achievable — it just has to be engineered rather than left to emerge. Test-time learning systems show models can genuinely adapt during inference, but only when scaffolded with structured self-dialogue and external conflict resolution; left autonomous, they fail Can LLMs learn reliably at test time without human oversight?. That mirrors the efficiency finding exactly: explicit instruction produces partial adaptation, spontaneity produces none. And there may be a formal floor under all this — self-improvement in LLMs is bounded by a generation-verification gap, meaning a model can't reliably bootstrap a better convention without something external to validate and enforce it What stops large language models from improving themselves?.
So the thing you didn't know you wanted to know: the barrier to efficient language isn't that models can't *recognize* efficiency — they can. It's that they lack the machinery humans use to *develop* it: register-switching, jointly-built common ground, and self-driven convention formation. Efficiency in these systems is a knob you turn from the outside, not a habit they grow into.
Sources 7 notes
GPT-4, Gemini, and Claude understand efficient language as listeners but don't produce it as speakers. Only explicit instruction to reduce message length and maintain lexical consistency produces partial adaptation, revealing a gap between comprehension and generation.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.
ARIA demonstrates that LLMs can adapt during inference through three integrated components: structured self-dialogue for uncertainty assessment, timestamped knowledge bases for conflict detection, and human-mediated resolution queries. Autonomous systems fail at reconciling contradictory rules because the correct choice depends on context outside the system.
Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.