Can we control personality in language models without prompting?

Can lightweight adapter modules enable continuous, fine-grained control over psychological traits in transformer outputs independent of prompt engineering? This explores whether architecture-level personality modification outperforms prompt-based approaches.

Synthesis note · 2026-02-23 · sourced from Psychology Therapy Practice

PsychAdapter modifies the transformer architecture to accept continuous psychological trait scores as input, enabling generation conditioned on personality, mental health, and demographic variables without consuming context window or relying on prompt engineering. The key difference from prior work: trait influence is applied at every transformer layer via a learned dimension expansion, not just at the input level.

Training uses social media and blog posts with estimated psychological scores from an empirically-trained language-based assessment model. The adapter learns how to weight the psychological scores' contribution to each layer alongside standard next-word prediction. The result: fine-grained, continuous control over personality expression. An input vector of (0, 0, +3, 0, 0) generates text characteristic of high extraversion while remaining average on other Big Five dimensions. Any combination is possible, including interactions: high openness with low extraversion produces text that captures both traits simultaneously.

Expert raters evaluated generated text at 87.3% average accuracy for Big Five personalities and 96.7% for depression and life satisfaction. These numbers hold across GPT-2, Gemma (2B), and Llama 3, demonstrating model-agnostic applicability. The total added parameters are less than 0.1% of the base model (55,296 for Gemma 2B vs 2 billion base parameters), making distribution trivial.

Applications extend beyond mental health simulation: customer service training with diverse personalities, crisis worker training with simulated distress levels, machine translation matched to audience education/dialect levels, and research tools that generate coherent text (not isolated words/phrases) for trait analysis. Since Do personality traits activate hidden emoji patterns in language models?, PsychAdapter may be activating these pre-existing trait-language circuits through a more precise mechanism than prompting or full fine-tuning.

Inquiring lines that read this note 50

This note is a source for these research framings, grouped by the broader line of inquiry each explores. Scan the bold lines of inquiry; follow any specific question forward.

What prevents language models from reliably adopting diverse personas?

How can conversational AI maintain consistent personas across conversations?

Why do persona-level simulations fail to predict individual preferences accurately?

What makes AI persuasion effective and how can we counter it?

What defenses exist against personality-based psychological targeting at scale?

Can prompting inject entirely new knowledge into language models?

Can AI systems balance emotional competence with factual reliability?

What makes trait-level warmth different from behavior-level emotion rewards in AI?

How should personalization be implemented to improve AI assistant effectiveness?

Do language model representations contain causally steerable task-specific features?

Why do LLM chatbots fail as independent therapeutic agents?

Can personality control improve training outcomes for crisis workers and therapists?

Does fine-tuning modify underlying model capabilities or only behavioral outputs?

Can training data analysis predict which samples will cause unintended personality changes?

What structural biases does transformer attention create in language model outputs?

How does transformer attention architecture amplify identity-congruent biases in persona-assigned models?

Does alignment training create blind spots in detecting genuine safety threats?

Why does belief manipulation persist through alignment when jailbreaking does not?

Do harness improvements transfer across model scales or memorize shortcuts?

Can per-user adapters remain consistent without drifting or leaking?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 96 in 2-hop network ·medium cluster Open in graph ↗

Can we control personality in language models wi… Do personality traits activate hidden emoji patter… Can we track and steer personality shifts during m… Can open language models adopt different personali…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Do personality traits activate hidden emoji patterns in language models? When large language models are fine-tuned on personality traits, do they spontaneously generate emojis that were never in their training data? This explores whether personality adjustment activates latent, pre-existing patterns in model weights.
complementary mechanism; PsychAdapter modifies every layer while neuron-level work identifies specific locations
Can we track and steer personality shifts during model finetuning? This research explores whether personality traits in language models occupy specific linear directions in activation space, and whether we can detect and control unwanted personality changes during training using these geometric directions.
persona vectors operate in activation space; PsychAdapter operates in weight space; both enable personality control
Can open language models adopt different personalities through prompting? Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
PsychAdapter bypasses prompt resistance entirely by operating at the architecture level

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

lightweight psychological trait adapters modify every transformer layer with less than 0.1 percent additional parameters — enabling fine-grained psychological profile control independent of prompting

Can we control personality in language models without prompting?

Inquiring lines that read this note 50

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 5