Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

Paper · arXiv 2403.05020 · Published March 8, 2024
Social Theory and Society

Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.

Introduction. LLM-based social simulations have become great tools for modeling human behavior in various contexts (Park et al., 2023; Sreedhar and Chilton, 2024), understanding and measuring LLMs’ social skills through certain tasks (Zhou et al., 2024; Bianchi et al., 2024), improving and aligning chatbot systems by providing training data (Kim et al., 2023a; Hong et al., 2023; Chen et al., 2023a; Pang et al., 2024). The success in replicating human-like social interactions across diverse domains offers a promising picture of the realistic social capabilities of LLMs. However, the role of information asymmetry in these simulations, i.e., the degree to which interlocutors in interactions have access to each other’s internal private mental states and goals, has been largely overlooked (Weber, 1978; Tomasello, 1999; Oey et al., 2023)1.

Discussion / Conclusion. We scrutinize recent advances in social simulation by evaluating current approaches’ ability to generalize to settings that are closer to human interaction. Focusing on cooperation and competition given information-asymmetric settings, we evaluate three modes of deploying LLMs based on past approaches in the literature. We find that LLMs continue to face challenges when operating in more realistic AGENTS mode. Meanwhile, the simulations generated from the SCRIPT mode show biases toward exploiting white box access to the participants early in the interaction. Furthermore, we find that finetuning models on these generations improve selectively on a measure of goal completion from Sotopia, but it also imbues the implausible strategies from the ‘omniscient’ SCRIPT simulations into the student models, resulting in further bias. We find that generating simulations from a single LLM that has control over both sides results in substantially higher goal completion rates. Human conversation participants however, need to contend with irreducible uncertainties that result from not having access to the mental states of our interlocutors. Therefore, successful human interaction is marked by the seamless navigation of this uncertainty (Hawkins et al., 2021; Pinker et al., 2008).