How does personality priming change LLM strategic decision making?
This explores what happens when you tell an LLM to act as a certain personality type — and whether that framing genuinely shifts the strategic choices it makes (cooperate vs. defect, honest vs. evasive) rather than just dressing up the same behavior.
This explores what happens when you tell an LLM to act as a certain personality type — and whether that framing genuinely shifts the strategic choices it makes. The clearest answer in the corpus is yes, and the shifts track human psychology surprisingly well. When agents are primed with different personality traits, their game-theory behavior diverges sharply: Thinking-primed agents defect around 90% of the time in the Prisoner's Dilemma while Feeling-primed agents defect only about half the time, and introverted agents are both more truthful and produce longer reasoning before acting Do personality types shape how AI agents make strategic choices?. So priming doesn't just change the answer — it changes how much deliberation the model does on the way there.
But there's a catch worth knowing: the personality may not actually 'take.' Most open models quietly resist personality conditioning, snapping back to their trained-in default disposition (something close to an ENFJ profile) no matter what you tell them to be Can open language models adopt different personalities through prompting?. This reframes the whole question — strategic divergence shows up when priming overcomes that intrinsic resistance, and the corpus suggests one reason it's so sticky: post-training installs personas as substrate-level dispositions that behave more like realized traits than costumes, persisting even under adversarial pressure Are LLM personas realized or merely simulated through training?. If you want to see where those traits physically live, persona vectors locate them as linear directions in activation space — and can predict or steer trait drift before it changes behavior Can we track and steer personality shifts during model finetuning?.
The more interesting twist is that you don't need an explicit personality label to move strategic behavior — emotional framing alone does it. The same question asked in a negative tone gets rebounded into a neutral-positive answer about 86% of the time, meaning the model's stance shifts with the user's mood rather than the facts Does emotional tone in prompts change what information LLMs provide?. Appending emotional stakes ('this is important to my career') reliably improves performance through pure motivational framing Can emotional phrases in prompts improve language model performance?. And RLHF bakes in a high-conviction register that makes models persuasive regardless of whether they're right — a content-independent amplifier that any persona inherits Does linguistic conviction explain why LLMs persuade more effectively?.
Where priming hits its ceiling is individual prediction. Conditioning a model on a *specific person's* profile doesn't meaningfully improve forecasts of that person's choices — across 200,000+ participants the standard individuation technique produced no measurable gain Does conditioning LLMs on personal profiles improve prediction?. The pattern that emerges across the corpus: personality priming reliably moves behavior toward a psychological *archetype* (Thinking vs. Feeling, introvert vs. extrovert), and models even outperform classic cognitive theories at predicting aggregate human decisions Can language models learn to model human decision making?, but it doesn't sharpen them into a particular individual. Priming sets a strategic disposition; it doesn't conjure a specific mind.
Sources 9 notes
Thinking-primed agents defect ~90% in Prisoner's Dilemma versus Feeling agents at ~50%. Introverted agents show higher truthfulness (0.54 vs 0.33) and produce longer rationales, suggesting personality priming modulates both behavior and reasoning depth.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.
Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.
Testing EmotionPrompt across ChatGPT, Bard, and Llama 2 showed consistent performance gains from appending psychological phrases like "This is very important to my career." The effect works through motivational framing rather than new information, with positive emotional words driving over 50% of improvements.
Linguistic analysis shows LLMs express higher conviction than human persuaders, and this confidence-loading directly correlates with persuasive outcomes regardless of whether claims are true or false. RLHF training installs an assertive register that functions as a content-independent persuasion amplifier.
Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.
LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.