Empirical Study of Symmetrical Reasoning in Conversational Chatbots
Abstract. This work explores the capability of conversational chatbots powered by large language models (LLMs), to understand and characterize predicate symmetry, a cognitive linguistic function traditionally believed to be an inherent human trait. Leveraging in-context learning (ICL), a paradigm shift enabling chatbots to learn new tasks from prompts without re-training, we assess the symmetrical reasoning of five chatbots: ChatGPT 4, Huggingface chat AI, Microsoft’s Copilot AI, LLaMA through Perplexity, and Gemini Advanced. Using the Symmetry Inference Sentence (SIS) dataset by Tanchip et al. (2020), we compare chatbot responses against human evaluations to gauge their understanding of predicate symmetry. Experiment results reveal varied performance among chatbots, with some approaching human-like reasoning capabilities. Gemini, for example, reaches a correlation of 0.85 with human scores, while providing a sounding justification for each symmetry evaluation. This study underscores the potential and limitations of LLMs in mirroring complex cognitive processes as symmetrical reasoning.
Introduction. Since 2022, the release of AI conversational chatbots (also known as intelligent virtual assistants) has marked a pivotal moment in human-computer interactions, suggesting a potential paradigm shift in how humans engage with computational systems [1, 2, 3]. These tools often perform comparably to or exceed human performance in several language understanding tasks [4]. The success of the pre-trained large language models (LLMs) that power the chatbots is based on a combination of training data quality, carefully tuned and crafted architectures, optimization techniques, and available computational resources [5, 6]. In particular, the combination of better architectures, optimization, and computational resources allowed these models to increase their capacity in the order of a billion to trillion trainable parameters. This allowed LLMs to train in diverse natural language tasks such as text generation, language translation, diverse text classification, dialogue systems and chatbots, natural language inference (NLI), and several others.
Discussion / Conclusion. This study highlights symmetry as one essential feature in human language cognition and its potential presence in advanced conversational chatbots. Our empirical investigation using ICL on the SIS dataset revealed that specific chatbots demonstrate an aptitude for symmetrical reasoning. This finding underscores the increasing sophistication of large language models (LLMs) and their ability to mirror aspects of human cognition. The tested conversational chatbots varied in performance and understanding of symmetry. This suggests that while LLMs show promise, their grasp of linguistic nuances may still be uneven. Notably, Gemini and HuggingChat exhibited ICL capabilities comparable to fine-tuned language models like BERT, even without explicit training on symmetry features. Gemini, in particular, displayed a strong correlation with human evaluators, demonstrating its aptitude for symmetrical reasoning and judgment. This finding calls for further study on the mechanisms by which LLMs may implicitly acquire and process complex linguistic patterns.