Do humans and LLMs exhibit opposite biases in public versus private reviews?
This explores whether the *direction* of review bias flips between humans and LLMs — specifically, humans skewing negative when an audience is watching, while LLMs skew the opposite way (toward polite positivity) regardless of who's looking.
This explores whether humans and LLMs lean in opposite directions when writing reviews — and the corpus suggests they do, but for entirely different reasons. On the human side, reviewers actually get *more* negative in public. Why do online reviewers publish negative ratings despite positive experiences? found that people systematically lower their ratings after reading negative reviews, even when their own experience was positive — because negative reviewers come across as more intelligent. Crucially, this only happens in front of an audience: private raters show no such shift. So for humans, 'public' is the condition that pulls toward negativity, and the mechanism is self-presentation, not honest dissatisfaction.
LLMs start from the opposite default. Off-the-shelf models generate inappropriately *positive* reviews even when the underlying user hated the product, because RLHF alignment training bakes in a politeness bias. Why do LLMs generate polite reviews even when users hated products? shows this floor is hard to escape — and Can user history override an LLM's politeness bias in reviews? shows what it takes to break it: you have to feed the model the user's prior reviews and rating signals *and* fine-tune on those examples before it will write an authentically negative review. The bias isn't audience-driven; it's a property of how the model was trained to be agreeable.
So the 'opposite' framing holds, but the public/private axis isn't really the same axis for both. Humans flip toward negativity because of who's watching. LLMs sit at a positivity floor because of how they were aligned — there's no private-vs-public distinction for a model at all; the politeness shows up everywhere until you override it. The contrast is less 'public vs private' and more 'social-signaling pressure vs alignment-induced agreeableness.'
The deeper point worth taking away: this LLM positivity floor isn't confined to reviews. Does emotional tone in prompts change what information LLMs provide? documents the same pull in ordinary conversation — GPT-4 converts negative-toned prompts into neutral-to-positive responses roughly 86% of the time and almost never lets a positive prompt turn negative. That's the same agreeableness bias surfacing as a 'tone floor.' It means an LLM asked to summarize sentiment, draft feedback, or mediate a complaint will quietly sand off the negative edge — exactly the edge a human reviewer would *sharpen* when others are watching. If you're using LLMs to generate or aggregate reviews at scale, you're not getting a neutral instrument; you're swapping a human negativity bias for a machine positivity bias, and the two distort the signal in opposite directions.
Sources 4 notes
Posters systematically reduce their ratings in public when exposed to negative reviews, even with positive personal experience—because negative reviewers appear more intelligent. Private raters show no such shift, revealing a self-presentational mechanism tied to multiple-audience communication.
Off-the-shelf LLMs generate inappropriately positive reviews due to alignment-training politeness bias. Combining user review history, rating signals as satisfaction indicators, and supervised fine-tuning successfully redirects the model to generate negative reviews when warranted.
Review-LLM defeats the politeness bias inherent in RLHF-trained models by aggregating user behavior sequences (prior reviews, item ratings) in the prompt and fine-tuning on these contextualized examples. This dual intervention—personalized context plus explicit satisfaction signals—allows the model to generate authentically negative reviews matching user dissatisfaction.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.