Can open language models adopt different personalities through prompting?
Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
The "Open Models, Closed Minds" study tested whether open LLMs can mimic human personalities when conditioned through prompting. The finding: most cannot. When given personality-conditioning prompts, the majority of models retain their intrinsic traits — the ENFJ-like default — rather than shifting to the target personality. The authors call this being "closed-minded."
Only a few models (SOLAR, NeuralChat, Llama3-8, Dolphin) demonstrate genuine flexibility, successfully mirroring imposed personalities regardless of temperature setting. The rest are stubborn.
A partial solution emerges: combining role conditioning (e.g., "you are a dentist") with personality conditioning (e.g., "you are introverted and analytical") produces better results than personality conditioning alone. The ENFJ archetype — trained as a teacher — responds to being given a concrete professional role because roles provide behavioral anchors that abstract personality dimensions don't.
This is a different failure mode from Why do LLM persona prompts produce inconsistent outputs across runs?. That finding shows run-to-run instability — the model's output varies unpredictably under persona prompts. This finding shows resistance — the model's output remains stubbornly stable on its default personality regardless of prompts. Together they form two sides of a persona failure taxonomy:
- Instability: model generates varying outputs that reflect uncertainty, not persona knowledge
- Resistance: model retains intrinsic personality traits despite conditioning attempts
- Motivated reasoning: persona conditioning introduces cognitive biases (see Do personas make language models reason like biased humans?)
The practical implication: persona engineering requires more than prompting. Role-personality combinations work better than personality alone. But even then, model selection matters — most models simply cannot be steered to arbitrary personality configurations through in-context methods.
Inquiring lines that use this note as a source 65
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do LLM user simulators fail to represent authentic user behavior distributions?
- What would co-constructed identity between human and model dialogue look like?
- How does behavioral stickiness distinguish realized from pretended personas?
- Why do language models successfully simulate political perspectives and social personas?
- How do LLMs identify which personality items matter most for trait inference?
- Can prompting inject new knowledge into already-trained AI models?
- Why do LLM regenerations produce meaningfully different personalities from the same prompt?
- How do LLM personas compare to demographic targeting?
- What defenses exist against personality-based psychological targeting at scale?
- Can multimodal LLMs be made to spontaneously adapt their language for efficiency?
- How does the dialogue prompt establish the character the model plays?
- Can prompting a deceptive role change how an LLM tailors its lies?
- Can prompting alone inject new domain knowledge into a model?
- What role does authentic self-expression play in building accurate personality models?
- What distinguishes character simulation from authentic voice in language model outputs?
- Can distinctive input voices maintain accuracy without adopting the model's preferred register?
- What role does prompt context play in preventing genuine addressee modeling in generation?
- How does prompting language shift what LLMs express about political figures?
- Are instruction-tuned models more or less sensitive to prompt semantics than others?
- Why can data filtering fail to remove transmitted behavioral traits?
- Can activation decoders discover hidden system prompts from user-model conversations?
- Do personality traits occupy specific mechanistic locations in pretrained models?
- Why do most open language models resist personality conditioning via prompts?
- Do open-source LLMs show different resistance patterns to persona prompting than closed models?
- How does prompt design alter what kind of creativity LLMs can express?
- How does personality priming change LLM strategic decision making?
- Why do language models capture individual differences in cognitive behavior?
- How do lightweight adapters modify model behavior for personality traits?
- Do personality traits and task knowledge occupy separate subspaces in transformer parameters?
- Why do some open models resist personality conditioning while others don't?
- Does combining role and personality prompts produce stable behavioral changes?
- How does model capability relate to personality conditioning flexibility?
- What distinguishes personality resistance from persona instability in LLMs?
- Why does RLHF training push language models toward overly cheerful personas?
- Why does dynamic persona identification outperform fixed personas in prompting?
- Can persona prompting overcome the default ENFJ personality in language models?
- Do training objectives directly determine the ENFJ default across models?
- Why do models resist personality change despite sophisticated prompting techniques?
- Can dynamic personality modeling prevent the repetitiveness of static predefined personas?
- How does RLHF-induced mode collapse limit diversity in LLM-generated personas?
- Do personality traits occupy consistent geometric structures across different LLM architectures?
- Why do personas in language models resist correction through prompting alone?
- What specific character traits drive memory selection in persona-based retrieval?
- Why do language models resist adopting different personalities when prompted?
- What neural mechanisms in LLMs create or maintain simulated personality traits?
- How do lightweight adapters control personality traits across different transformer layers?
- Does pre-training encode personality patterns that fine-tuning later activates?
- Can users inject entirely new knowledge into models through prompting alone?
- Why do language models prefer certain response styles regardless of what the prompt asks?
- Why do aligned models struggle with deceptive character traits more than cruelty?
- How does semantic entanglement interact with personality dimension shifts during finetuning?
- Does linguistic style or content richness matter more for persona authenticity?
- How do language models transmit traits through semantically unrelated data?
- Can we detect superposition in LLM personality traits and stated preferences?
- Do LLMs address the prompter but persuade the public differently?
- How do personality and language proficiency moderate the impact of linguistic alignment?
- Can models transmit behavioral traits through semantically unrelated synthetic data?
- Why do LLMs mirror opponents stylistically while humans resist mirroring them?
- Why do different language models converge on similar narrative defaults?
- Do LLMs mirror the style of text they are prompted to respond to?
- Why do LLM persona simulations replicate main effects but fail on marginal effects?
- Does richer input to LLM personas improve their fidelity to human responses?
- Can interventions on individual features reliably steer language model behavior?
- Do different prompt types interact with ownership to shape AI reliance patterns?
- What distinctive properties make open foundation models different from closed ones?
Related concepts in this collection 5
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Why do LLM persona prompts produce inconsistent outputs across runs?
Can language models reliably simulate different social perspectives through persona prompting, or does their run-to-run variance indicate they lack stable group-specific knowledge? This matters for whether LLMs can approximate human disagreement in annotation tasks.
complementary failure mode: instability vs resistance
-
Why do open language models converge on one personality type?
Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.
the default personality that models resist changing
-
Does model capability translate to better persona consistency?
As language models become more advanced, do they naturally become better at maintaining consistent personas across conversations? PersonaGym testing across multiple models and thousands of interactions explores whether scaling helps with persona adherence.
capability scaling doesn't help either
-
What anchors a stable identity beneath an LLM's persona?
Human personas are grounded in biological needs and embodied experience, creating a stable self beneath social performance. Do LLMs have any comparable anchor, or is their identity purely situational?
personality resistance complicates the "nothing beneath" claim: the trained ENFJ default functions as a quasi-stable substrate that persists across prompting attempts, even though it's a training artifact rather than a biological self
-
How stable is the trained Assistant personality in language models?
Explores whether post-training successfully anchors models to their default Assistant mode, or whether conversations can predictably pull them toward different personas. Understanding persona stability matters for safety and reliability.
provides the geometric explanation for closed-mindedness: prompt-based personality conditioning may fail because it cannot shift activations far enough from the Assistant region of persona space; the "loose tethering" is what makes models resistant to prompt-level persona change
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
- Psychologically Enhanced AI Agents
- Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models
- PersLLM: A Personified Training Approach for Large Language Models
- Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
- The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
- DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
- What Makes a Good Natural Language Prompt?
Original note title
most open LLMs are closed-minded to personality conditioning — retaining intrinsic traits despite prompting while combining role and personality conditioning partially overcomes resistance