DeepCT-enhanced Lexical Argument Retrieval

Paper · Source

The recent Touché lab’s argument retrieval task focuses on controversial topics like ‘Should bottled water be banned?’ and asks to retrieve relevant pro/con arguments. Interestingly, the most effective systems submitted to that task still are based on lexical retrieval models like BM25. In other domains, neural retrievers that capture semantics are more effective than lexical baselines. To add more “semantics” to argument retrieval, we propose to combine lexical models with DeepCT-based document term weights. Our evaluation shows that our approach is more effective than all the systems submitted to the Touché lab while being on par with modern neural re-rankers that themselves are computationally more expensive.

Introduction. Lexical retrieval models like BM25 (Robertson et al., 1994) or DirichletLM (Zhai and Lafferty, 2001) are the basis of many of the early argument retrieval approaches (Chernodub et al., 2019; Potthast et al., 2019; Stab et al., 2018; Wachsmuth et al., 2017) and also were the most common choice of many participants of the Touché lab’s shared task on argument retrieval for controversial questions (Bondarenko et al., 2020, 2021). A few neural rankers like K-NRM (Xiong et al., 2017) and CEDR (MacAvaney et al., 2019) were used by the task participants but showed to be less effective than the task’s official DirichletLM-based baseline. Interestingly, also newer neural retrieval models like ColBERTv2 (Santhanam et al., 2022) and LaPraDoR (Xu et al., 2022) are less effective than BM25 on the Touché subset of the BEIR benchmark for zero-shot retrieval (Thakur et al., 2021). In this paper, we propose to improve the effectiveness of lexical argument retrieval models by adding a semantic document expansion step that uses term weights calculated by DeepCT (Dai and Callan, 2020b).

Discussion / Conclusion. In this paper, we proposed to combine lexical retrieval models with semantic document expansion for argument retrieval. Specifically, to calculate the term weights, we fine-tuned DeepCT on the args.me corpus. The main advantages of DeepCT are that the calculation of term weights can be done in an offline fashion before document indexing and that its training does not require manual relevance judgments. This is especially important in the specialized domains (e.g., argument retrieval), where no or little training data is available. Furthermore, at query time only lexical retrieval models are used on the expanded documents that require less computational resources than neural models. Our evaluation results showed that adding some “semantics” to strong lexical argument retrieval approaches improves the overall effectiveness over the lexical retrieval alone. Additionally, we showed that our approach is on par with modern neural rerankers, which themselves can be more computationally expensive. However, we also indicated that for a more robust conclusion, further experiments should be conducted, where the missing relevance judgments are filled.

Lines of inquiry this paper opens 24

Research framings built by reading the notes related to this paper — the questions it feeds into.

Does conversational format create illusions of genuine AI communication?

How does AI-generated content transformation affect public discourse quality?

What makes AI persuasion effective and how can we counter it?

Can AI systems balance emotional competence with factual reliability?

How does rapport-building language persist across all GenAI validation responses?

How should human oversight be integrated with autonomous AI systems?

Can humans develop oversight strategies that work across all GenAI rhetorical shifts?

Does RLHF training sacrifice accuracy and grounding for user agreement?

What training methods make models more persuasive but less factually accurate?

How does rhetorical adaptation affect LLM persuasion and detectability?

Why can't humans reliably detect AI-generated text despite measurable linguistic signatures?

What linguistic cues help humans detect whether moral arguments come from AI?

DeepCT-enhanced Lexical Argument Retrieval

Synthesis notes from this paper's topics 8

Lines of inquiry this paper opens 24