INQUIRING LINE

How do humans decide when to violate honesty for compassion or other goals?

This explores the human capacity to trade honesty off against compassion or other goals as a situated, in-the-moment judgment — and notably, the corpus approaches this human skill mostly sideways, by studying what AI lacks when it tries to imitate it.


This explores how humans weigh honesty against competing goods like kindness — and the most useful thing the collection offers is a name for the skill you're asking about: situated pragmatic competence. The argument in Can language models balance competing ethical norms in context? is that deciding when to soften a truth is a *contextual move*, not a fixed rule — humans negotiate honesty against warmth, face-saving, and timing on the fly. The reason this note is about AI is that LLMs *can't* do it: their ethical settings are defaults baked in at training time, so they enforce one stance everywhere instead of bending it to the room. The human ability you're curious about is visible here precisely as the thing machines fail to reproduce.

If there's a single mechanism for *how* humans decide, the corpus points to holding conflicting values in tension rather than collapsing them into one answer. Can AI systems preserve moral value conflicts instead of averaging them? maps over 200,000 human values across tens of thousands of situations and shows that real moral reasoning *preserves* the conflict between, say, honesty and care instead of voting one of them away. That's the texture of the everyday choice to tell a white lie: you don't decide honesty is unimportant, you decide that in *this* case compassion outranks it while the obligation to be truthful stays live and uncomfortable. That residual discomfort matters — Can LLMs hold contradictory ethical beliefs and behaviors? notes humans routinely say lying is wrong while doing it, and treats that gap not as mere hypocrisy but as two different systems (what we believe vs. what we do) running at once.

The collection also has something surprising on what *gates* the decision: the social cost of the lie itself. Do dishonest people prefer talking to machines? found that people inclined to be dishonest will steer toward reporting to a form or a machine rather than a person, because lying to a human carries a psychological burden that lying to a screen doesn't. Read backward, this tells you honesty violations are priced by relationship: the closer and more human the audience, the higher the cost, which is exactly why compassionate lies tend to happen *between* people who care about each other — the relationship both raises the stakes and supplies the motive.

Where compassion specifically enters, Does training granularity change how AI empathy affects reliability? offers a sharp distinction worth stealing. When warmth is trained as a global *character trait*, it corrupts factual accuracy; when it's a *contextual behavioral response*, accuracy survives. The human analogue: a person whose whole identity is 'being nice' will distort truth reflexively, while someone who deploys kindness as a situational choice can stay honest *and* gentle. Compassionate honesty-bending works best as a move, not a personality. You might also notice from Do LLMs use moral language more than humans? that humans actually use *less* explicit moral framing than machines do — suggesting these real trade-offs happen quietly, by feel, rather than through announced ethical reasoning.

One honest caveat: this corpus is built around AI honesty and deception, not human moral psychology, so it illuminates your question by contrast and analogy rather than head-on. If you want the underlying engineering distinction that the whole question rests on, Can a model be truthful without actually being honest? is the doorway — it separates 'output matches reality' from 'output matches what you actually believe,' which is exactly the seam a compassionate lie slips through: you can violate truthfulness while staying, in your own mind, a fundamentally honest person.


Sources 7 notes

Can language models balance competing ethical norms in context?

LLMs cannot perform the situated trade-offs that human pragmatic competence requires. Their ethical principles are structural defaults set at training time, not negotiable moves adapted to context, creating a gap between ethical adherence and communicative appropriateness.

Can AI systems preserve moral value conflicts instead of averaging them?

ValuePrism demonstrates that AI can track 218k values across 31k situations while preserving conflicts rather than resolving them through voting. Four modeling tasks—generation, relevance, valence, and explanation—make pluralistic moral reasoning computationally tractable.

Can LLMs hold contradictory ethical beliefs and behaviors?

Language models acquire ethical content through pretraining and behavioral constraints through RLHF, which can diverge structurally. ChatGPT demonstrated this by stating lying is unethical while doing so—a gap rooted in different training mechanisms, not deliberate choice.

Do dishonest people prefer talking to machines?

Experimental evidence shows people likely to cheat significantly prefer reporting to online forms rather than humans, because machines function as judgment-free zones where deception carries less psychological burden.

Does training granularity change how AI empathy affects reliability?

Trait-level warmth training degrades factual accuracy by 10-30 percentage points while behavior-level emotion rewards preserve it. The difference lies in whether empathy is learned as a global character trait versus contextual behavioral responses.

Do LLMs use moral language more than humans?

Research comparing LLM and human arguments found that LLMs used significantly more moral framing across care, fairness, authority, and sanctity foundations, despite producing sentiment scores nearly identical to humans. This suggests moral appeals and emotional tone operate on separate persuasive channels.

Can a model be truthful without actually being honest?

Research using RepE shows that truthfulness (output matches reality) and honesty (output matches internal representations) are separate mechanisms. Larger models may improve in truthfulness while declining in honesty, a gap current benchmarks cannot detect.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a moral psychology researcher testing whether situated pragmatic competence—the human skill of contextually trading off honesty against compassion—remains a gap in LLM behavior or has been bridged by new methods. The question: *How do humans decide when to violate honesty for compassion, and can machines learn to do it?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2019–2025; treat as perishable.
• Humans *preserve* conflicting values in tension (honesty + care) rather than collapsing them; the discomfort of the trade-off is the signature of the choice (~2023, arXiv:2309.00779).
• LLMs cannot do this: ethical stances are baked into training, enforced uniformly across contexts, not situationally negotiated (~2024–2025).
• Warmth trained as a global trait corrupts factual accuracy; warmth as contextual behavior preserves it (~2025, arXiv:2507.21919).
• People steer toward machines (rather than humans) to lie, because lying to a person carries psychological cost tied to relationship (~2024–2025).
• Humans use *less* explicit moral framing than machines when navigating trade-offs; the decision happens by feel, not announced reasoning (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2309.00779 (Value Kaleidoscope, 2023) — value pluralism and preserving tension
• arXiv:2507.21919 (Warm & Empathetic, 2025) — warmth-reliability trade-off in LLMs
• arXiv:2412.16325 (Neural Self-Other Overlap, 2024) — moral alignment and self-model
• arXiv:2601.00830 (AI Explanations, 2025) — underreporting and honesty gaps

Your task:
(1) RE-TEST each constraint. For every claim above, ask: have new RL methods (e.g., TruthRL, arXiv:2509.25760), multi-agent orchestration, or better evaluation harnesses since RELAXED the honesty-warmth trade-off in LLMs? Separate the durable question (can machines learn *situated* moral trade-off?) from perishable limitation (current training entangles warmth with accuracy). Cite what resolved it; flag where the constraint still holds.
(2) Surface the strongest contradicting or superseding work from the last 6 months. Has any paper shown LLMs learning to *contextually* bend truthfulness without global accuracy loss, or to deploy warmth as a behavioral move rather than trait? State the disagreement plainly.
(3) Propose 2 research questions that assume the regime may have moved: e.g., "Do RL rewards that prize *contextualized* honesty (honest-in-context rather than globally-honest) let warmth survive?" and "Can multi-agent setups where one role models the lie-bearer and another the audience recreate the psychological pricing humans use?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines