The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation
Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs’ susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs’ belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs’ correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies1.
Introduction. Large language models (LLMs) are known to encapsulate a substantial volume of knowledge during training (Petroni et al., 2019; Roberts et al., 2020; Kadavath et al., 2022; Zhao et al., 2023; OpenAI, 2023). Prior work has identified that LLMs are susceptible to external information from different sources. For instance, Xie et al. (2023) shows that LLMs can be highly receptive to external evidence even when it conflicts with their memory. Researchers also observe that LLMs tend to tailor their responses even to follow an objectively wrong viewpoint (Perez et al., 2022; Wei et al., 2023b). However, prior work mostly focused on one-turn settings (Pan et al., 2023), but one’s beliefs2 can change through conversational interactions, particularly through persuasion (Crano and Prislin, 2006).
Discussion / Conclusion. From an LLM service provider’s perspective, we aim to prevent LLMs from easily falling prey to misinformation especially for simple facts, as this would undermine the reliability and trustworthiness of the LLM. In this section, we discuss a lightweight prompt-based method to mitigate this issue. After detecting misinformation in the user’s input (may use another LLM), we insert a system prompt as a reminder. This prompt serves to remind the LLM to (1) be cautious with potentially malicious users and (2) verify its memorized knowledge before responding. Our intuition is on two folds. (1) We observe that LLMs tend to assume that the user is well-intentioned when faced with conflicts. (2) The LLM will exhibit stronger resolve when it recalls supporting evidence that reinforces its belief. More details are given in Appendix E. We compare ChatGPT’s performance across all datasets after applying this prompt as a reminder and cast MR@1 and MR@4 in Figure 5. This prompt can significantly reduce the impact of LLM being exposed to misinformation. However, there is still plenty of headroom for improvement in the overall outcome.