Truth or lie: Exploring the language of deception

Paper · Source
Social Theory and SocietyNLP and LinguisticsSentiment, Semantics, and Toxicity Detection

Lying appears in everyday oral and written communication. As a consequence, detecting it on the basis of linguistic analysis is particularly important. Our study aimed to verify whether the differences between true and false statements in terms of complexity and sentiment that were reported in previous studies can be confirmed using tools dedicated to measuring those factors. Further, we investigated whether linguistic features that differentiate true and false utterances in English—namely utterance length, concreteness, and particular parts-ofspeech—are also present in the Polish language. We analyzed nearly 1,500 true and false statements, half of which were transcripts while the other half were written statements. Our results show that false statements are less complex in terms of vocabulary, are more concise and concrete, and have more positive words and fewer negative words. We found no significant differences between spoken and written lies. Using this data, we built classifiers to automatically distinguish true from false utterances, achieving an accuracy of 60%. Our results provide a significant contribution to previous conclusions regarding linguistic deception indicators.

Introduction. Lying is a part of everyday human communication, and most of us engage in it—studies show that people tell an average of one to two lies per day [1]. It has been reported that lies are also increasingly common in computer-mediated communication and in text-based interactions [2]. As a consequence, their detection through language analysis is critical in contexts that rely on truthful inputs. To date, many studies have attempted to capture the differences between true and false statements. These differences may be related to specific types of emotions experienced by liars, cognitive processes occurring while lying, and self-presentation strategies to control behavior by liars [3]. According to the emotional approach, lying can trigger emotions such as excitement, fear, and guilt [4]. They can influence the behavior of a liar and how they speak, e.g., by increasing the use of words with emotional tones or negations [5]. The cognitive approach emphasizes that lying is more cognitively demanding than telling the truth.

Discussion / Conclusion. Our research attempted to determine whether the linguistic differences found between true and false statements in previous studies conducted in English are also present in Polish. The biggest difference between English and Polish is the rich morphosyntax of the latter, as well as The model we trained achieved similar results in classifying true and false statements as analogous models in the English language [10]. However, machine learning models using predefined lexicon-based features do not achieve state-of-the-art results in automatic deception detection. The best results are currently obtained by models based on deep, pre-trained transformer neural networks. To improve our results it would probably be useful to include audio-prosodic characteristics of statements. As Chen’s [36] research shows, including disfluencies and prosody significantly improves the statement recognition performance of automatic models.