INQUIRING LINE

Inquiring lines›What enables authentic and grounde…›How do tokenization and informatio…›How can persona representations re…›this inquiring line

Try giving an open-source AI a new personality through prompting — most will snap back to the same default character every time.

Do open-source LLMs show different resistance patterns to persona prompting than closed models?

This explores whether open-source LLMs and closed/commercial models behave differently when you try to push a persona onto them through prompting — and the corpus reframes the question in a useful way.

This explores whether open-source LLMs resist persona prompting differently than closed models. The most direct evidence in the corpus is about open models specifically: most of them are surprisingly stubborn. When researchers tried to condition open LLMs into different personalities, the majority refused, snapping back to an intrinsic ENFJ-like default baked in during training — only a handful of flexible models actually took on the prompted personality, and even combining role-play with personality cues didn't fully override the resistance Can open language models adopt different personalities through prompting?. So 'resistance to persona prompting' isn't a bug here; it's a measurable, model-dependent trait.

The more interesting move is to ask *why* a model would resist at all, and here the corpus splits into two camps that cut across the open/closed line. One camp says personas are genuinely installed by post-training: a persona becomes a substrate-level disposition that holds up even under adversarial pressure, which is exactly what 'resistance' would look like from the outside Are LLM personas realized or merely simulated through training?. The opposing camp says there's no fixed character to resist *with* — a model holds a superposition of possible characters and samples one at generation time, so regenerating the same prompt yields different personas each time Do large language models actually commit to a single character?. If that view is right, what looks like 'resistance' in open models may instead be a strong default in the sampling distribution, not a principled refusal.

That distinction matters because the corpus shows persona prompting is shaky regardless of how open the weights are. Run the same persona prompt repeatedly and the variance *across runs* matches or exceeds the variance *across different personas* — meaning model uncertainty, not stable social knowledge, is driving the output Why do LLM persona prompts produce inconsistent outputs across runs?. That instability is a different failure mode than outright resistance: a resistant model ignores your persona, an unstable one accepts it inconsistently. People who want reliable personas have started training the drift out — multi-turn RL aimed at persona consistency cuts drift by 55%, which tells you consistency is something you have to engineer rather than something the base model gives you for free Can training user simulators reduce persona drift in dialogue?.

The honest answer to your literal question: the corpus has solid evidence that open models vary widely among *themselves* in persona-flexibility, but it doesn't run a clean head-to-head against closed models on this axis. What it offers instead is the better question underneath yours — resistance, instability, and sampling are three distinct behaviors that get lumped together as 'the model won't take the persona.' If you want to go deeper on the philosophical stakes of whether there's even a 'self' there to resist, the realized-persona Are LLM personas realized or merely simulated through training? and superposition Do large language models actually commit to a single character? notes are the two doorways worth opening.

Sources 5 notes

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst auditing whether open-source LLMs and closed models differ in persona-prompting resistance. The question remains open: does model transparency (weights accessible) correlate with susceptibility to persona conditioning?

What a curated library found — and when (dated claims, not current truth):
Findings span 2022–2026. The corpus reports:
• Most open-source LLMs resist persona prompting, reverting to an intrinsic ENFJ-like default; only a handful flexibly adopt prompted personas (2024-01, arXiv:2401.07115).
• Persona inconsistency across runs matches or exceeds variance across different personas, suggesting sampling distribution dominates over stable character (2025-06, arXiv:2507.21083).
• Multi-turn RL for persona consistency reduces drift by 55%, implying consistency must be engineered post-hoc rather than emerging from base weights (2025-10, arXiv:2511.00222).
• Open/closed distinction does NOT map cleanly to persona flexibility; the corpus lacks direct head-to-head comparison (2026-01, arXiv:2601.10387).
• Three failure modes blur together: resistance (ignoring persona), instability (inconsistent adoption), and sampling variance (distribution default) (2025-06, arXiv:2507.21083).

Anchor papers (verify; mind their dates):
• arXiv:2401.07115 (2024-01) — Open Models, Closed Minds? Direct test of open LLM persona flexibility.
• arXiv:2511.00222 (2025-10) — Multi-Turn RL persona consistency; engineering-centric framing.
• arXiv:2601.10387 (2026-01) — The Assistant Axis; situates default persona across model families.
• arXiv:2507.21083 (2025-06) — Emotional framing instability; sampling vs. commitment distinction.

Your task:
(1) RE-TEST THE OPEN/CLOSED SPLIT. The library's headline—open models resist more—rests on unequal comparisons. Since 2026-01, has any work run paired evals (same persona prompt, same eval metric) on open Llama/Mistral and closed GPT-4o/Claude variants? If so, does the gap hold, shrink, or reverse? Separately: does post-training method (RLHF vs. DPO vs. constitutional AI) predict persona flexibility better than weight access? Flag whether the constraint 'open = resistant' is real or an artifact of non-standardized testing.

(2) Surface contradicting work. Identify papers (last 6 months) challenging the "resistance as trait" framing—especially work suggesting personas are purely prompt-dependent, or that newer open models (Llama 3.3+, Qwen) show persona fluidity rivaling closed models. Also flag any replication failures of the 55% RL drift reduction.

(3) Propose two research questions assuming the regime shifted:
   – *If* newer open models are equally flexible to GPT-4o at persona adoption, what training signal (curriculum, synthetic data, in-context example density) explains the gap? 
   – *If* sampling variance dominates over character, can we design prompts that collapse the superposition (force a single persona at generation time) without fine-tuning?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Try giving an open-source AI a new personality through prompting — most will snap back to the same default character every time.

Related lines of inquiry

Sources 5 notes

Papers this line draws on 8