DeepCT-enhanced Lexical Argument Retrieval
The recent Touché lab’s argument retrieval task focuses on controversial topics like ‘Should bottled water be banned?’ and asks to retrieve relevant pro/con arguments. Interestingly, the most effective systems submitted to that task still are based on lexical retrieval models like BM25. In other domains, neural retrievers that capture semantics are more effective than lexical baselines. To add more “semantics” to argument retrieval, we propose to combine lexical models with DeepCT-based document term weights. Our evaluation shows that our approach is more effective than all the systems submitted to the Touché lab while being on par with modern neural re-rankers that themselves are computationally more expensive.
Introduction. Lexical retrieval models like BM25 (Robertson et al., 1994) or DirichletLM (Zhai and Lafferty, 2001) are the basis of many of the early argument retrieval approaches (Chernodub et al., 2019; Potthast et al., 2019; Stab et al., 2018; Wachsmuth et al., 2017) and also were the most common choice of many participants of the Touché lab’s shared task on argument retrieval for controversial questions (Bondarenko et al., 2020, 2021). A few neural rankers like K-NRM (Xiong et al., 2017) and CEDR (MacAvaney et al., 2019) were used by the task participants but showed to be less effective than the task’s official DirichletLM-based baseline. Interestingly, also newer neural retrieval models like ColBERTv2 (Santhanam et al., 2022) and LaPraDoR (Xu et al., 2022) are less effective than BM25 on the Touché subset of the BEIR benchmark for zero-shot retrieval (Thakur et al., 2021). In this paper, we propose to improve the effectiveness of lexical argument retrieval models by adding a semantic document expansion step that uses term weights calculated by DeepCT (Dai and Callan, 2020b).
Discussion / Conclusion. In this paper, we proposed to combine lexical retrieval models with semantic document expansion for argument retrieval. Specifically, to calculate the term weights, we fine-tuned DeepCT on the args.me corpus. The main advantages of DeepCT are that the calculation of term weights can be done in an offline fashion before document indexing and that its training does not require manual relevance judgments. This is especially important in the specialized domains (e.g., argument retrieval), where no or little training data is available. Furthermore, at query time only lexical retrieval models are used on the expanded documents that require less computational resources than neural models. Our evaluation results showed that adding some “semantics” to strong lexical argument retrieval approaches improves the overall effectiveness over the lexical retrieval alone. Additionally, we showed that our approach is on par with modern neural rerankers, which themselves can be more computationally expensive. However, we also indicated that for a more robust conclusion, further experiments should be conducted, where the missing relevance judgments are filled.