Truth or lie: Exploring the language of deception

Paper · Source

Lying appears in everyday oral and written communication. As a consequence, detecting it on the basis of linguistic analysis is particularly important. Our study aimed to verify whether the differences between true and false statements in terms of complexity and sentiment that were reported in previous studies can be confirmed using tools dedicated to measuring those factors. Further, we investigated whether linguistic features that differentiate true and false utterances in English—namely utterance length, concreteness, and particular parts-ofspeech—are also present in the Polish language. We analyzed nearly 1,500 true and false statements, half of which were transcripts while the other half were written statements. Our results show that false statements are less complex in terms of vocabulary, are more concise and concrete, and have more positive words and fewer negative words. We found no significant differences between spoken and written lies. Using this data, we built classifiers to automatically distinguish true from false utterances, achieving an accuracy of 60%. Our results provide a significant contribution to previous conclusions regarding linguistic deception indicators.

Introduction. Lying is a part of everyday human communication, and most of us engage in it—studies show that people tell an average of one to two lies per day [1]. It has been reported that lies are also increasingly common in computer-mediated communication and in text-based interactions [2]. As a consequence, their detection through language analysis is critical in contexts that rely on truthful inputs. To date, many studies have attempted to capture the differences between true and false statements. These differences may be related to specific types of emotions experienced by liars, cognitive processes occurring while lying, and self-presentation strategies to control behavior by liars [3]. According to the emotional approach, lying can trigger emotions such as excitement, fear, and guilt [4]. They can influence the behavior of a liar and how they speak, e.g., by increasing the use of words with emotional tones or negations [5]. The cognitive approach emphasizes that lying is more cognitively demanding than telling the truth.

Discussion / Conclusion. Our research attempted to determine whether the linguistic differences found between true and false statements in previous studies conducted in English are also present in Polish. The biggest difference between English and Polish is the rich morphosyntax of the latter, as well as The model we trained achieved similar results in classifying true and false statements as analogous models in the English language [10]. However, machine learning models using predefined lexicon-based features do not achieve state-of-the-art results in automatic deception detection. The best results are currently obtained by models based on deep, pre-trained transformer neural networks. To improve our results it would probably be useful to include audio-prosodic characteristics of statements. As Chen’s [36] research shows, including disfluencies and prosody significantly improves the statement recognition performance of automatic models.

Lines of inquiry this paper opens 24

Research framings built by reading the notes related to this paper — the questions it feeds into.

What mechanisms enable AI systems to generate and spread false beliefs?

How do adversarial and manipulative prompts attack reasoning models?

How do token-masking patterns distinguish genuine documents from poisoned ones?

What makes AI persuasion effective and how can we counter it?

Can probing methods detect RLHF-induced persuasion in the same way they catch backdoors?

Why do language models struggle with implicit discourse relations?

What percentage of natural language relies on plausible deniability through ambiguous phrasing?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

Does AI fluency substitute for verifiable accuracy in human judgment?

Can users reliably distinguish valid reasoning from plausible-looking deception?

Is model self-awareness based on genuine introspection or pattern matching?

Can lie detection work from just honesty representation vectors?

How do chatbots affect human self-disclosure and emotional engagement?

Do people who might cheat deliberately choose machines to avoid lying to humans?

How do transformer attention mechanisms implement memory and algorithmic functions?

What does it mean to truly attend to someone in conversation?

Truth or lie: Exploring the language of deception

Synthesis notes that discuss concepts related to this paper 4

Lines of inquiry this paper opens 24