Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews

Paper · Source

Consumers of services and products exhibit a wide range of behaviors on social networks when they are dissatisfied. In this paper, we consider three types of cynical expressions – negative feelings, specific reasons, and attitude of being right – and annotate a corpus of 3189 comments in Spanish on car analysis channels from YouTube. We evaluate both token classification and text classification settings for this problem, and compare performance of different pre-trained models including BETO, Span- BERTa, Multilingual Bert, and RoBERTuito. The results show that models achieve performance above 0.8 F1 for all types of cynical expressions in the text classification setting, but achieve lower performance (around 0.6-0.7 F1) for the harder token classification setting.

Introduction. Consumers of services and products actively engage through social networks when they are dissatisfied, exhibiting a wide range of behaviors. Encinas and Cavazos (2021). Encinas presents a classification of dysfunctional consumer behaviors: mild behaviors such as rudeness, complaints, skepticism, or tantrums; moderate behaviors such as manifestations of cynicism, attempts at manipulation, or inappropriate comments and foul language; and intense consumer behaviors such as fraud, theft, verbal aggression, or revenge. We focus on cynical expressions of consumers, specifically in comments written in videos on the Youtube platform. Cynicism is a negative attitude with a broad or specific focus and comprises cognitive, affective, and behavioral components (Chylinski and Chu, 2010). Consumer cynicism can generate feelings of betrayal and deception, leading to anger and the desire to stop purchasing products or services from the source that generates their anger (Encinas and Cavazos, 2021). Within expressions of cynicism, we focus on the following specific expressions:

Discussion / Conclusion. The results achieved in the experiment show that it is possible to detect the three cynical expressions with reasonable reliability. Some of the results are discussed below. Performance was higher on the easier text classification task and lower on the more challenging token classification task. However, token classification is closer to the objective of this work, detecting exactly which part of the comment represents the cynical expression. To extend the success of the text classification setting to the token classification setting, it may be useful to investigate two-stage approaches, where text classification is first used to identify the broad region of the cynical expressions and token classification is then used to narrow down to the specific phrases. The experiments with RoBERTuito highlight that simply using a model trained for hate speech detection will not provide a solution for cynical expression detection, even in the related category of negative feelings: a non-fine-tuned RoBERTuito achieves only 0.671 F1, while a fine-tuned mBERT achieves 0.925 F1.

Lines of inquiry this paper opens 24

Research framings built by reading the notes related to this paper — the questions it feeds into.

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

How does AI-generated content transformation affect public discourse quality?

Can AI-generated outputs constitute genuine knowledge or valid claims?

Can debugging skills be validated if AI training degraded them first?

How do we evaluate AI systems when user perception misleads actual performance?

Why do workers who debug most with AI show the lowest learning outcomes?

How do prompt structure and constraints affect model instruction reliability?

How does prior coding experience change the way students use vibe coding tools?

How can AI systems learn from failures without cascading errors?

Why do some students restart entire projects instead of debugging incrementally?

How effectively do deterministic tools improve language model reasoning on formal tasks?

What scaffolding tools help users specify implicit contextual boundaries to models?

Should GUI agents use structured representations instead of raw pixels?

Is embodied interaction necessary for language meaning and genuine agency?

What fine-grained distinctions matter most for human situated action in categories?

How do knowledge graphs enable efficient multi-hop reasoning over alternatives?

How should visual content be connected to text within a unified knowledge representation?

How does sequence length affect sparsity tolerance in models?

How does factoring perception from reasoning improve sparse-label learning?

How should models express uncertainty rather than forced confident answers?

What cognitive structures do realistic belief models need to include?

Do language models develop causal world models or rely on statistical patterns?

Why must world models be nested rather than flat and uniform?

What actually drives chain-of-thought reasoning improvements in language models?

Why does chain-of-thought fail to improve multimodal model perception performance?

How can language models sustain linguistic synchrony and intersubjectivity during dialogue?

What happens when conversational design invites attention it cannot actually deliver?

How can humans calibrate appropriate trust in AI systems?

Does mandatory AI disclosure in policy help or harm user trust over time?

How do formal dialogue structures reveal conversation coherence mechanisms?

Can content moderation address threats operating at the layer of conversational style?

Does conversational format create illusions of genuine AI communication?

How do engagement metrics reward AI content that hollows out conversationality?

Can AI systems balance emotional competence with factual reliability?

How does rapport-building language persist across all GenAI validation responses?

Transformer-based cynical expression detection in a corpus of Spanish YouTube reviews

Synthesis notes from this paper's topics 8

Lines of inquiry this paper opens 24