INQUIRING LINE

Can language models learn to form ad-hoc conventions through training?

This explores whether models can develop the negotiated, on-the-fly shared meanings that communicators normally coordinate between themselves — and the corpus suggests it depends sharply on whether you mean conventions of form or conventions of meaning.


This explores whether a language model can develop ad-hoc conventions — the improvised shared agreements (a coined term, a private shorthand, a register two parties settle into) that emerge in real interaction. The corpus splits the question cleanly along a fault line: models pick up conventions of *form* readily through training, but the negotiated, meaning-bearing kind runs into structural walls.

The pessimistic thread is the stronger one. Convention-forming is fundamentally a pragmatic act — you and your interlocutor coordinate on a meaning neither of you held alone. Bender & Koller's argument Can language models learn meaning from text patterns alone? is that meaning lives in the relation between expressions and communicative intent, and a model trained only on form-to-form prediction has no access to the shared attention that grounds an agreement. Worse, even when a convention is established *in the conversation*, the model tends to revert: strong training-time priors override what's present in the context window Why do language models ignore information in their context?, so a freshly negotiated usage gets steamrolled by the statistically dominant one. And the very thing that would let a model adapt its register to a partner — pragmatic register-switching — is largely trained *out*, since alignment locks the model into one static communicative identity that users can't reshape through dialogue Can language models adapt communication style to different contexts?.

There's a subtler problem underneath: convention-forming assumes a stable party to do the agreeing. Shanahan's 20-questions test Do large language models actually commit to a single character? shows the model isn't committing to one character but sampling from a superposition — so "the model" you struck a convention with may not be the same one that answers next. A convention needs a counterparty who *holds* the agreement; a sampler doesn't.

But flip to conventions of pure form and the picture brightens. DPO training on correct-and-incorrect examples reliably drills in rigid output conventions — exact function-calling formats — where ordinary fine-tuning underperforms Can small models match large models on function calling?, which is essentially learning an arbitrary convention by being shown what violates it. And Transformer² composes task-specific expert behaviors on the fly at inference Can models dynamically activate expert skills at inference time?, a hint that adaptive, situation-specific reconfiguration is mechanically possible even if it isn't the same as negotiating meaning with a partner.

The thing worth carrying away: "learn a convention" quietly bundles two very different feats. Internalizing an arbitrary regularity from training data is something models do well — that's what training *is*. Coordinating a *new* shared meaning live, with an intent-bearing partner, is the part the corpus keeps flagging as out of reach — not because the model can't be flexible, but because it reproduces familiar training-distribution patterns rather than inventing coordinations, the same imitation signature that makes chain-of-thought degrade under distribution shift Does chain-of-thought reasoning reveal genuine inference or pattern matching?. Ad-hoc convention is exactly the distribution shift it's worst at.


Sources 7 notes

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Can small models match large models on function calling?

Small models fine-tuned via DPO on correct and incorrect function-calling examples from a large teacher model achieve high accuracy on logical and mathematical tasks. DPO's explicit negative examples directly target the rigid output format failures where SFT alone underperforms.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether language models can learn to form ad-hoc conventions through training. The question remains open, but a curated library's findings—spanning 2023–2025—have sketched constraints that may no longer hold.

What a curated library found — and when (dated claims, not current truth):
• Models readily internalize *form* conventions (e.g., function-calling formats via DPO training) but struggle to negotiate *meaning-bearing* conventions live because they lack access to communicative intent (Bender & Koller, ~2023).
• Training-time priors override context-window signals; freshly negotiated usage gets overwritten by statistically dominant patterns, blocking pragmatic register-switching (~2024).
• Models fail the 20-questions consistency test: they sample from a superposition rather than commit to a stable character, so "the model" you negotiate with may not be the same one answering next (~2024).
• Transformer² enables inference-time composition of task-specific behaviors via expert vectors, hinting at mechanically possible situational reconfiguration (~2025).
• Chain-of-thought is constrained imitation of reasoning form, not abstract inference; models degrade under distribution shift—and ad-hoc convention is exactly that shift (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2305.14825 (Large Language Models are In-Context Semantic Reasoners, 2023)
• arXiv:2410.12405 (ProSA: Prompt Sensitivity, 2024)
• arXiv:2501.06252 (Transformer2: Self-adaptive LLMs, 2025)
• arXiv:2506.02878 (CoT as Tight Constraint, 2025)

Your task:
(1) RE-TEST EACH CONSTRAINT. For meaning-bearing convention formation: has instruction-tuning, preference-alignment, or multi-turn RL since unlocked pragmatic intent-grounding? Has in-context learning or memory-augmentation made context override training priors? Has checkpoint-level or roll-out-level commitment reduced the sampling-superposition problem? Separate the durable question (can models *invent* shared meaning, not just copy it?) from perishable limits (can newer architectures *hold* a negotiated state?). Cite what resolved or persists.
(2) Surface the strongest work from the last 6 months that either CONTRADICTS the pessimistic thread (models *can* form conventions) or SUPERSEDES it (a different mechanism—e.g., retrieval-augmented dialogue, multi-agent ecosystems—makes the old constraint moot).
(3) Propose 2 research questions that ASSUME the regime may have moved: (a) Can multi-turn RL with a fixed partner *teach* a model to hold a convention across checkpoints? (b) Do mixture-of-experts or retrieval-based architectures escape the imitation ceiling that blocks ad-hoc meaning-making?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines