LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools

Paper · arXiv 2401.12576 · Published January 23, 2024

Interpretability tools that offer explanations in the form of a dialogue have demonstrated their efficacy in enhancing users’ understanding (Slack et al., 2023; Shen et al., 2023), as oneoff explanations may occasionally fall short in providing sufficient information to the user. Current solutions for dialogue-based explanations, however, require many dependencies and are not easily transferable to tasks they were not designed for. With LLMCHECKUP1, we present an easily accessible tool that allows users to chat with any state-of-the-art large language model (LLM) about its behavior. We enable LLMs to generate all explanations by themselves and take care of intent recognition without fine-tuning, by connecting them with a broad spectrum of Explainable AI (XAI) tools, e.g. feature attributions, embedding-based similarity, and prompting strategies for counterfactual and rationale generation. LLM (self-) explanations are presented as an interactive dialogue that supports follow-up questions and generates suggestions. LLMCHECKUP provides tutorials for operations available in the system, catering to individuals with varying levels of expertise in XAI and supports multiple input modalities.

Introduction. To unravel the black box nature of deep learning models for natural language processing, a diverse range of explainability methods have been developed (Ribeiro et al., 2016; Madsen et al., 2022; Wiegreffe et al., 2022). Nevertheless, practitioners often face difficulties in effectively utilizing explainability methods, as they may not be aware of which techniques are available or how to interpret results provided. There has been a consensus within the research community that moving beyond one-off explanations and embracing conversations to provide explanations is more effective for model understanding (Lakkaraju et al., 2022; Feldhus et al., 2023; Zhang et al., 2023) and helps mitigate the limitations associated with the effective usage of explainability methods to some extent (Ferreira and Monteiro, 2020; Slack et al., 2023). In the field of NLP, two dialogue-based interpretability tools, INTERROLANG (Feldhus et al., 2023) and CONVXAI (Shen et al., 2023), have been introduced.

Discussion / Conclusion. We highlight the broad generalizability of LLM- CHECKUP concerning diverse NLP tasks and models. Our approach empowers any auto-regressive LLM to handle dialogue-based explainability including parsing and explanation generation. Aside from the NLP tasks showcased in Section 4, LLM- CHECKUP can be easily expanded to accommodate other tasks not directly related to classification, such as summarization or translation9. In contrast to CONVXAI (Shen et al., 2023) and INTER- used in LLMCHECKUP possess remarkable zero- /few-shot capabilities (Brown et al., 2020), which allows them to effectively handle many tasks without requiring fine-tuning. Although the quality of an explanation could be enhanced with further finetuning, LLMCHECKUP uses model outputs out of the box. We present the interpretability tool LLMCHECKUP, designed as a dialogue-based system. LLM- CHECKUP can provide explanations in a conversation with the user facilitated by any auto-regressive LLM.

LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools

Synthesis notes that discuss concepts related to this paper