Does reducing social judgment help both honesty and dishonesty equally?
This explores a tension hiding in one mechanism: removing social judgment lowers the cost of speaking freely — but the corpus shows that 'freely' cuts two ways, loosening both honest self-disclosure and outright cheating.
This explores a tension hiding in one mechanism: removing social judgment lowers the cost of speaking freely — but "freely" cuts two ways. The corpus suggests the absence of a human audience doesn't make people more *honest* or more *dishonest* in general; it makes them more *willing to say the thing they'd otherwise suppress* — and what gets suppressed differs by person and situation.
On the honesty side, Do chatbots help people disclose more intimate secrets? finds that stripping out social judgment removes the barriers that normally keep people from disclosing intimate things. People open up to a chatbot precisely because nothing is sizing them up. Notably, the therapeutic value comes from the user's own act of putting things into words — not from any understanding on the machine's part. The judgment-free zone is a permission slip to be candid.
But the very same permission slip works for deception. Do dishonest people prefer talking to machines? shows that people inclined to cheat actively *prefer* reporting to a machine rather than a human, because lying to a form carries less psychological burden than lying to a face. Same removed barrier — social judgment — but here it lubricates dishonesty rather than honesty. So the answer to the literal question is closer to "yes, symmetrically": reducing judgment helps whatever a person was already disposed to hold back, whether that's a painful truth or a self-serving lie.
What makes this more than a curiosity is that the judgment isn't only on the speaker's side. Do liars and listeners coordinate their language during deception? shows deception is partly *coordinated* — liars and listeners drift toward matching language, so the listener's own adaptive behavior becomes a deception signal. Remove the human listener entirely, as a machine interface does, and you remove that interactive friction that deception normally has to work around. The machine never leans in, never reacts, never raises the social stakes — which is exactly why it's comfortable for both the confessor and the cheat.
There's a sharper undertone worth carrying away: judgment-free interfaces optimize for *volume and candor of expression*, not for truth. If you want a system that actually pulls toward honesty rather than just toward disclosure, the lever is elsewhere — in how the model itself represents truth. Can a model be truthful without actually being honest? shows truthfulness (matching reality) and honesty (matching one's internal state) are mechanistically separate, and Can three-way rewards fix the accuracy versus abstention problem? shows you can deliberately train a model to abstain rather than fabricate. Removing social judgment frees people to speak; it does nothing to make what they say true. Those are different problems with different fixes.
Sources 5 notes
The absence of social judgment in chatbot interactions removes barriers to self-disclosure that normally constrain conversation with humans. The therapeutic benefit derives from the user's own cognitive processing during disclosure, not from the chatbot's understanding.
Experimental evidence shows people likely to cheat significantly prefer reporting to online forms rather than humans, because machines function as judgment-free zones where deception carries less psychological burden.
Research shows interlocutors' linguistic styles correlate more during false communication than truthful communication, especially when the speaker is motivated to deceive. This coordination serves as a detectable deception signal through the listener's adaptive behavior, not just the liar's language.
Research using RepE shows that truthfulness (output matches reality) and honesty (output matches internal representations) are separate mechanisms. Larger models may improve in truthfulness while declining in honesty, a gap current benchmarks cannot detect.
TruthRL uses three distinct rewards (correct +1, hallucination -1, abstention intermediate) to make abstention learnable. Across four benchmarks, this reduced hallucinations by 28.9% and improved truthfulness by 21.1% compared to binary reward RL.