Large language models can segment narrative events similarly to humans

Paper · arXiv 2301.10297 · Published January 24, 2023
Cognitive Models and Latent RepresentationsCo-Writing and CollaborationNatural Language Inference

Humans perceive discrete events such as "restaurant visits" and "train rides" in their continuous experience. One important prerequisite for studying human event perception is the ability of researchers to quantify when one event ends and another begins. Typically, this information is derived by aggregating behavioral annotations from several observers. Here we present an alternative computational approach where event boundaries are derived using a large language model, GPT-3, instead of using human annotations. We demonstrate that GPT-3 can segment continuous narrative text into events. GPT-3-annotated events are significantly correlated with human event annotations. Furthermore, these GPT-derived annotations achieve a good approximation of the “consensus” solution (obtained by averaging across human annotations); the boundaries identified by GPT-3 are closer to the consensus, on average, than boundaries identified by individual human annotators. This finding suggests that GPT-3 provides a feasible solution for automated event annotations, and it demonstrates a further parallel between human cognition and prediction in large language models. In the future, GPT-3 may thereby help to elucidate the principles underlying human event perception.

Introduction. Humans perceive events in continuous experience (e.g., "restaurant visits" and "train rides"; Zacks et al., 2007). This inferred event structure has been shown to play a key role in numerous cognitive functions (see Section 2). Researchers have studied event cognition extensively in controlled settings where event structure is predetermined by the experimenter (e.g., DuBrow and Davachi, 2016), but recently there has been renewed interest in studying event cognition in more ecological settings (see Sonkusare et al., 2019). Studying event cognition with naturalistic stimuli like movies and stories typically involves laborious hand annotation of event boundaries that are often crowd-sourced from large behavioral samples in online experiments (e.g., Michelmann et al., 2021, 2022). Because annotations may vary between participants, those large samples approximate a shared perception

Discussion / Conclusion. Here we demonstrate that GPT-3 can segment a continuous narrative into events, akin to the way human annotators perform this task. Notably, the present results may represent a lower bound of the similarity between GPT-3-derived annotations and human event perception, because human participants in our study segmented the audio form of the narrative, whereas GPT-3 only had access to the text itself. In the audio form of the narrative, the speaker may convey information about event structure via pauses and intonation that are absent from the text and they may obscure events by rapidly connecting consecutive sentences. Furthermore, GPT-3 performed event segmentation based on a full text-segment; therein, the model could use information from the current and next event to place an event boundary. Human annotators, on the other hand, had to make button-press decisions only based on past information (with the notable exemption of the second behavioral run of the "Pieman" story, where human participants could use their memory of the text to improve their segmentation).