Can we control personality in language models without prompting?
Can lightweight adapter modules enable continuous, fine-grained control over psychological traits in transformer outputs independent of prompt engineering? This explores whether architecture-level personality modification outperforms prompt-based approaches.
PsychAdapter modifies the transformer architecture to accept continuous psychological trait scores as input, enabling generation conditioned on personality, mental health, and demographic variables without consuming context window or relying on prompt engineering. The key difference from prior work: trait influence is applied at every transformer layer via a learned dimension expansion, not just at the input level.
Training uses social media and blog posts with estimated psychological scores from an empirically-trained language-based assessment model. The adapter learns how to weight the psychological scores' contribution to each layer alongside standard next-word prediction. The result: fine-grained, continuous control over personality expression. An input vector of (0, 0, +3, 0, 0) generates text characteristic of high extraversion while remaining average on other Big Five dimensions. Any combination is possible, including interactions: high openness with low extraversion produces text that captures both traits simultaneously.
Expert raters evaluated generated text at 87.3% average accuracy for Big Five personalities and 96.7% for depression and life satisfaction. These numbers hold across GPT-2, Gemma (2B), and Llama 3, demonstrating model-agnostic applicability. The total added parameters are less than 0.1% of the base model (55,296 for Gemma 2B vs 2 billion base parameters), making distribution trivial.
Applications extend beyond mental health simulation: customer service training with diverse personalities, crisis worker training with simulated distress levels, machine translation matched to audience education/dialect levels, and research tools that generate coherent text (not isolated words/phrases) for trait analysis. Since Do personality traits activate hidden emoji patterns in language models?, PsychAdapter may be activating these pre-existing trait-language circuits through a more precise mechanism than prompting or full fine-tuning.
Inquiring lines that use this note as a source 48
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- How do LLMs identify which personality items matter most for trait inference?
- Can fine-tuning or RLHF alone solve the persona distortion problem?
- Do personality inferences from text show the same demographic biases as norm predictions?
- What defenses exist against personality-based psychological targeting at scale?
- How does prompt optimization differ from building persistent activation context?
- What makes trait-level warmth different from behavior-level emotion rewards in AI?
- How do input length constraints reshape personalization system design choices?
- Why can data filtering fail to remove transmitted behavioral traits?
- Can continuous persona vectors in activation space monitor personality shifts?
- Do personality traits occupy specific mechanistic locations in pretrained models?
- Why do most open language models resist personality conditioning via prompts?
- How do trait adapters interact with different base model architectures?
- Can personality control improve training outcomes for crisis workers and therapists?
- How do lightweight adapters modify model behavior for personality traits?
- Do personality traits and task knowledge occupy separate subspaces in transformer parameters?
- Can activation-level persona vectors predict which weight regions encode personality?
- Why do some open models resist personality conditioning while others don't?
- Does combining role and personality prompts produce stable behavioral changes?
- How does model capability relate to personality conditioning flexibility?
- Which chatbot archetypes actually experience novelty decay in practice?
- How does the Assistant Axis relate to the ENFJ personality convergence?
- Can persona prompting overcome the default ENFJ personality in language models?
- Do training objectives directly determine the ENFJ default across models?
- What competitive advantages does the ENFJ default create in human-AI interactions?
- How much of prompt sensitivity is really just frequency optimization in disguise?
- Why do handcrafted acoustic features outperform neural speaker embeddings for personality?
- Why do models resist personality change despite sophisticated prompting techniques?
- Does the Assistant Axis gravitational pull prevent true individual-level persona personalization?
- Can dynamic personality modeling prevent the repetitiveness of static predefined personas?
- Do personality traits occupy consistent geometric structures across different LLM architectures?
- Can training data analysis predict which samples will cause unintended personality changes?
- What role might personality vectors play in preventing learned deception or reward hacking?
- How does transformer attention architecture amplify identity-congruent biases in persona-assigned models?
- Why do language models resist adopting different personalities when prompted?
- What neural mechanisms in LLMs create or maintain simulated personality traits?
- Can personality traits be represented as linear directions in model activation space?
- How do lightweight adapters control personality traits across different transformer layers?
- What causes different personality traits to trigger different emoji densities in generated text?
- Does pre-training encode personality patterns that fine-tuning later activates?
- Which personality types should we use for cooperative versus competitive tasks?
- Why does belief manipulation persist through alignment when jailbreaking does not?
- How does semantic entanglement interact with personality dimension shifts during finetuning?
- Is prompt engineering a workaround rather than a capability fix?
- Can Big Five personality models improve synthetic data quality at scale?
- Can activation capping prevent persona drift without sacrificing task performance?
- How do personality and language proficiency moderate the impact of linguistic alignment?
- Can models transmit behavioral traits through semantically unrelated synthetic data?
- Can interventions on individual features reliably steer language model behavior?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do personality traits activate hidden emoji patterns in language models?
When large language models are fine-tuned on personality traits, do they spontaneously generate emojis that were never in their training data? This explores whether personality adjustment activates latent, pre-existing patterns in model weights.
complementary mechanism; PsychAdapter modifies every layer while neuron-level work identifies specific locations
-
Can we track and steer personality shifts during model finetuning?
This research explores whether personality traits in language models occupy specific linear directions in activation space, and whether we can detect and control unwanted personality changes during training using these geometric directions.
persona vectors operate in activation space; PsychAdapter operates in weight space; both enable personality control
-
Can open language models adopt different personalities through prompting?
Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
PsychAdapter bypasses prompt resistance entirely by operating at the architecture level
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- PsychAdapter: Adapting LLM Transformers to Reflect Traits, Personality and Mental Health
- Psychologically Enhanced AI Agents
- Assessment of Personality Dimensions Across Situations Using Conversational Speech
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models
- Can AI Have a Personality? Prompt Engineering for AI Personality Simulation: A Chatbot Case Study in Gender-Affirming Voice Therapy Training
- Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language Models
- From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs
- From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
Original note title
lightweight psychological trait adapters modify every transformer layer with less than 0.1 percent additional parameters — enabling fine-grained psychological profile control independent of prompting