SYNTHESIS NOTE

Can open language models adopt different personalities through prompting?

Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.

Synthesis note · 2026-02-22 · sourced from Personas Personality

The "Open Models, Closed Minds" study tested whether open LLMs can mimic human personalities when conditioned through prompting. The finding: most cannot. When given personality-conditioning prompts, the majority of models retain their intrinsic traits — the ENFJ-like default — rather than shifting to the target personality. The authors call this being "closed-minded."

Only a few models (SOLAR, NeuralChat, Llama3-8, Dolphin) demonstrate genuine flexibility, successfully mirroring imposed personalities regardless of temperature setting. The rest are stubborn.

A partial solution emerges: combining role conditioning (e.g., "you are a dentist") with personality conditioning (e.g., "you are introverted and analytical") produces better results than personality conditioning alone. The ENFJ archetype — trained as a teacher — responds to being given a concrete professional role because roles provide behavioral anchors that abstract personality dimensions don't.

This is a different failure mode from Why do LLM persona prompts produce inconsistent outputs across runs?. That finding shows run-to-run instability — the model's output varies unpredictably under persona prompts. This finding shows resistance — the model's output remains stubbornly stable on its default personality regardless of prompts. Together they form two sides of a persona failure taxonomy:

Instability: model generates varying outputs that reflect uncertainty, not persona knowledge
Resistance: model retains intrinsic personality traits despite conditioning attempts
Motivated reasoning: persona conditioning introduces cognitive biases (see Do personas make language models reason like biased humans?)

The practical implication: persona engineering requires more than prompting. Role-personality combinations work better than personality alone. But even then, model selection matters — most models simply cannot be steered to arbitrary personality configurations through in-context methods.

Inquiring lines that read this note 65

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

How can LLM user simulators model realistic goal-driven conversation?

How do LLM user simulators fail to represent authentic user behavior distributions?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

What would co-constructed identity between human and model dialogue look like?

How can conversational AI maintain consistent personas across conversations?

How can persona representations reduce language model variance and improve task accuracy?

What prevents language models from reliably adopting diverse personas?

Can prompting inject entirely new knowledge into language models?

What makes AI persuasion effective and how can we counter it?

What defenses exist against personality-based psychological targeting at scale?

What articulatory information do speech signals carry that text cannot?

Can multimodal LLMs be made to spontaneously adapt their language for efficiency?

Can LLM personas constitute genuine psychology or remain linguistic role-play?

How does the dialogue prompt establish the character the model plays?

Can prompting strategies overcome LLM biases without model fine-tuning?

Is model self-awareness based on genuine introspection or pattern matching?

What role does authentic self-expression play in building accurate personality models?

Do language models learn genuine linguistic structure or just surface patterns?

What distinguishes character simulation from authentic voice in language model outputs?

Why do language models struggle with implicit discourse relations?

How does prompting language shift what LLMs express about political figures?

Do language model representations contain causally steerable task-specific features?

Do language models develop causal world models or rely on statistical patterns?

Why do language models capture individual differences in cognitive behavior?

Does RLHF training sacrifice accuracy and grounding for user agreement?

Why does RLHF training push language models toward overly cheerful personas?

Does alignment training create blind spots in detecting genuine safety threats?

Why do aligned models struggle with deceptive character traits more than cruelty?

How does rhetorical adaptation affect LLM persuasion and detectability?

How do interface design choices shape consciousness attribution?

Do different prompt types interact with ownership to shape AI reliance patterns?

How should models express uncertainty rather than forced confident answers?

What distinctive properties make open foundation models different from closed ones?

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 95 in 2-hop network ·medium cluster Open in graph ↗

Can open language models adopt different persona… Why do LLM persona prompts produce inconsistent ou… Why do open language models converge on one person… Does model capability translate to better persona … What anchors a stable identity beneath an LLM's pe… How stable is the trained Assistant personality in…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why do LLM persona prompts produce inconsistent outputs across runs? Can language models reliably simulate different social perspectives through persona prompting, or does their run-to-run variance indicate they lack stable group-specific knowledge? This matters for whether LLMs can approximate human disagreement in annotation tasks.
complementary failure mode: instability vs resistance
Why do open language models converge on one personality type? Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.
the default personality that models resist changing
Does model capability translate to better persona consistency? As language models become more advanced, do they naturally become better at maintaining consistent personas across conversations? PersonaGym testing across multiple models and thousands of interactions explores whether scaling helps with persona adherence.
capability scaling doesn't help either
What anchors a stable identity beneath an LLM's persona? Human personas are grounded in biological needs and embodied experience, creating a stable self beneath social performance. Do LLMs have any comparable anchor, or is their identity purely situational?
personality resistance complicates the "nothing beneath" claim: the trained ENFJ default functions as a quasi-stable substrate that persists across prompting attempts, even though it's a training artifact rather than a biological self
How stable is the trained Assistant personality in language models? Explores whether post-training successfully anchors models to their default Assistant mode, or whether conversations can predictably pull them toward different personas. Understanding persona stability matters for safety and reliability.
provides the geometric explanation for closed-mindedness: prompt-based personality conditioning may fail because it cannot shift activations far enough from the Assistant region of persona space; the "loose tethering" is what makes models resistant to prompt-level persona change

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

most open LLMs are closed-minded to personality conditioning — retaining intrinsic traits despite prompting while combining role and personality conditioning partially overcomes resistance

Can open language models adopt different personalities through prompting?

Inquiring lines that read this note 65

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4