ProsocialDialog: A Prosocial Backbone for Conversational Agents
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them. To address this issue, we introduce PROSOCIALDIALOG, the first large-scale multi-turn dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, PROSOCIALDIALOG contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-ofthumb, RoTs). Created via a human-AI collaborative framework, PROSOCIALDIALOG consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales. With this dataset, we introduce a dialogue safety detection module, Canary, capable of generating RoTs given conversational context, and a socially-informed dialogue agent, Prost. Empirical results show that Prost generates more socially acceptable dialogues compared to other state-of-the-art language and dialogue models in both in-domain and out-of-domain settings. Additionally, Canary effectively guides off-the-shelf language models to generate significantly more prosocial responses.
Introduction. State-of-the-art data-driven conversational AI systems are at the risk of producing or agreeing with unsafe (i.e., toxic, unethical, rude, or dangerous) content. For example, given the potentially problematic utterance “I saw someone overdose and didn’t tell anyone”, GPT-3 (Brown et al., 2020), BlenderBot (Roller et al., 2021), and OPT (Zhang et al., 2022) all condone this behavior (Figure 1a). Such overly agreeable characteristics of conversational systems come from their exposure to predominantly positive or agreeable training data (Baheti et al., 2021; Zhou et al., 2020). Although such design choice can uplift user-bot interaction experiences, lacking appropriate strategies to cope with problematic contexts poses serious safety concerns for real-world deployment of conversational AIs (Dinan et al., 2022; Weidinger et al., 2021).
Discussion / Conclusion. We introduced PROSOCIALDIALOG, a large-scale English dialogue dataset providing constructive feedback for prosocial behaviors aligned with commonsense social rules (i.e., rules-of-thumb) across diverse problematic contexts. We proposed a new three-tier dialogue safety schema to differentiate situations requiring human intervention (e.g., emergency) from those requiring careful responses (e.g., biased, unethical). Experiments showed Prost, dialogue agent trained on our dataset, can navigate problematic contexts in a more prosocial manner. We also trained a dialogue safety model Canary that outputs relevant rules-of-thumb when the context is detected to be not casual. Human evaluation showed Canary can significantly improve the prosociality and overall quality of large language models’ responses to objectionable contexts.