Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-ofthought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi- Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregate their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.
Introduction. In chain-of-thought (CoT) prompting, a large language model (Brown et al., 2020; Chowdhery et al., 2022; Kadavath et al., 2022; Touvron et al., 2023) is prompted to generate its answer following a stepby-step explanation (Wei et al., 2022; Nye et al., 2022). CoT prompting has been shown to dramatically improve performance on reasoning-heavy tasks (Kojima et al., 2022; Zhou et al., 2022). Furthermore, Wang et al. (2023) showed that sampling multiple chains of thought and returning their majority output further improves accuracy, a method which they term self-consistency (SC). While SC leads to performance gains, it also has several shortcomings. First, when the space of possible outputs is large (Kalyan et al., 2021), each reasoning chain may lead to a different output, in which case no significant majority will be formed. Second, focusing exclusively on the final output discards relevant information that is present in the intermediate reasoning steps. Consider answering the question “Did Brad Peyton need to know about seismology?” (Fig. 1).
Discussion / Conclusion. This work introduces MCR for meta-reasoning over multiple reasoning chains. We evaluate MCR on 7 datasets for multi-hop QA that require both implicit and explicit reasoning in an open-domain setting and show that it outperforms previous approaches on all evaluation benchmarks.