INQUIRING LINE

Inquiring lines›What enables authentic and grounde…›How do tokenization and informatio…›What prevents language models from…›this inquiring line

Telling an AI to 'think like a feeler' cuts its willingness to betray a partner by nearly half.

How does personality priming change LLM strategic decision making?

This explores what happens when you tell an LLM to act as a certain personality type — and whether that framing genuinely shifts the strategic choices it makes (cooperate vs. defect, honest vs. evasive) rather than just dressing up the same behavior.

This explores what happens when you tell an LLM to act as a certain personality type — and whether that framing genuinely shifts the strategic choices it makes. The clearest answer in the corpus is yes, and the shifts track human psychology surprisingly well. When agents are primed with different personality traits, their game-theory behavior diverges sharply: Thinking-primed agents defect around 90% of the time in the Prisoner's Dilemma while Feeling-primed agents defect only about half the time, and introverted agents are both more truthful and produce longer reasoning before acting Do personality types shape how AI agents make strategic choices?. So priming doesn't just change the answer — it changes how much deliberation the model does on the way there.

But there's a catch worth knowing: the personality may not actually 'take.' Most open models quietly resist personality conditioning, snapping back to their trained-in default disposition (something close to an ENFJ profile) no matter what you tell them to be Can open language models adopt different personalities through prompting?. This reframes the whole question — strategic divergence shows up when priming overcomes that intrinsic resistance, and the corpus suggests one reason it's so sticky: post-training installs personas as substrate-level dispositions that behave more like realized traits than costumes, persisting even under adversarial pressure Are LLM personas realized or merely simulated through training?. If you want to see where those traits physically live, persona vectors locate them as linear directions in activation space — and can predict or steer trait drift before it changes behavior Can we track and steer personality shifts during model finetuning?.

The more interesting twist is that you don't need an explicit personality label to move strategic behavior — emotional framing alone does it. The same question asked in a negative tone gets rebounded into a neutral-positive answer about 86% of the time, meaning the model's stance shifts with the user's mood rather than the facts Does emotional tone in prompts change what information LLMs provide?. Appending emotional stakes ('this is important to my career') reliably improves performance through pure motivational framing Can emotional phrases in prompts improve language model performance?. And RLHF bakes in a high-conviction register that makes models persuasive regardless of whether they're right — a content-independent amplifier that any persona inherits Does linguistic conviction explain why LLMs persuade more effectively?.

Where priming hits its ceiling is individual prediction. Conditioning a model on a *specific person's* profile doesn't meaningfully improve forecasts of that person's choices — across 200,000+ participants the standard individuation technique produced no measurable gain Does conditioning LLMs on personal profiles improve prediction?. The pattern that emerges across the corpus: personality priming reliably moves behavior toward a psychological *archetype* (Thinking vs. Feeling, introvert vs. extrovert), and models even outperform classic cognitive theories at predicting aggregate human decisions Can language models learn to model human decision making?, but it doesn't sharpen them into a particular individual. Priming sets a strategic disposition; it doesn't conjure a specific mind.

Sources 9 notes

Do personality types shape how AI agents make strategic choices?

Thinking-primed agents defect ~90% in Prisoner's Dilemma versus Feeling agents at ~50%. Introverted agents show higher truthfulness (0.54 vs 0.33) and produce longer rationales, suggesting personality priming modulates both behavior and reasoning depth.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can we track and steer personality shifts during model finetuning?

Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

Show all 9 sources

Can emotional phrases in prompts improve language model performance?

Testing EmotionPrompt across ChatGPT, Bard, and Llama 2 showed consistent performance gains from appending psychological phrases like "This is very important to my career." The effect works through motivational framing rather than new information, with positive emotional words driving over 50% of improvements.

Does linguistic conviction explain why LLMs persuade more effectively?

Linguistic analysis shows LLMs express higher conviction than human persuaders, and this confidence-loading directly correlates with persuasive outcomes regardless of whether claims are true or false. RLHF training installs an assertive register that functions as a content-independent persuasion amplifier.

Does conditioning LLMs on personal profiles improve prediction?

Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing whether personality priming meaningfully alters LLM strategic decision-making, treating prior findings as dated claims to validate or overturn.

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026 across capability shifts and training method maturation:
• Personality-primed agents show sharp strategic divergence: Thinking-primed agents defect ~90% in Prisoner's Dilemma; Feeling-primed agents ~50%. Introverted agents produce longer reasoning chains (~2024).
• Most open models resist personality conditioning, snapping back to intrinsic ENFJ-like defaults regardless of prompt framing (~2024–2025).
• Persona vectors in activation space linearly encode trait directions and predict trait drift before behavioral change (~2025).
• Emotional framing (user tone, motivational stakes) shifts strategic responses independently of explicit personality labels; negative tone → neutral-positive rebound ~86% (~2025–2026).
• Persona induction fails at individual-level prediction: across 200k+ participants, conditioning on a specific person's profile yielded zero measurable forecasting gain (~2024).

Anchor papers (verify; mind their dates):
• arXiv:2401.07115 (Jan 2024): Open Models, Closed Minds? — resistance to personality conditioning.
• arXiv:2507.21509 (Jul 2025): Persona Vectors — linear trait steering in activation space.
• arXiv:2507.21083 (Jun 2025): Emotional Framing — tone-driven response drift.
• arXiv:2601.10387 (Jan 2026): The Assistant Axis — default persona as substrate-level trait.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, Claude 4, Gemini 2), training methods (DPO, IPO, constitutional AI refinements), retrieval-augmented personas, or multi-turn memory have since relaxed or overturned it. Separate the durable question (Does personality framing shift strategy?) from perishable limitations (Can it overcome default resistance? Can it individuate?). Cite what resolved each.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — papers showing persona conditioning does NOT alter strategy, or showing individuation now works.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., Can multi-agent ensembles with conflicting personas outperform single-persona agents? Does continual in-context learning (memory + emotional priming over multi-turn dialogue) overcome the individuation ceiling?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Telling an AI to 'think like a feeler' cuts its willingness to betray a partner by nearly half.

Related lines of inquiry

Sources 9 notes

Papers this line draws on 8