In-context learning agents are asymmetric belief updaters
We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worsethan-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns. Taken together, our results contribute to our understanding of how incontext learning works by highlighting that the framing of a problem significantly influences how learning occurs, a phenomenon also observed in human cognition.
Introduction. Large language models (LLMs) are powerful artificial systems that excel at many tasks (Radford et al., 2019). They can, among other things, write code (Roziere et al., 2023), help to translate from one language to another (Kocmi & Federmann, 2023), and play computer games (Wang et al., 2023). Their abilities are so far-reaching that some (Bubeck et al., 2023) have argued that they “could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system”. At the same time, they are notoriously difficult to interpret which becomes especially aggravating as these models permeate through our society. In the present paper, we aim to shed light on the in-context learning abilities of LLMs (Brown et al., 2020). Making use of the two-alternative forced choice (2AFC; Fechner, 1860) paradigm from cognitive science, we show that incontext learning implements an asymmetric updating rule when learning about the values of options.
Discussion / Conclusion. People change their learning strategies based on how the problem is framed (Palminteri & Lebreton, 2022). In this paper, we have shown that this also holds for in-context learning agents. In particular, we found that LLMs exhibit an optimism bias, i.e. they learn more from better-thanexpected outcomes (positive prediction errors) than from worse-than-expected ones (negative prediction errors). However, this bias was only present when the prompt was formulated in a way that implied agency. Furthermore, we found that for counterfactual feedback for unchosen options, the bias reversed and the model learned more from negative than positive errors for these options. We conducted these analyses in a highly controlled setting, providing high internal validity to our results. However, claims regarding their external validity must be taken with care for now, and future studies will have to investigate In the tasks we have investigated, it is rational to perform asymmetric belief updating – as indicated by our simulations with Meta-RL agents. It remains an open question if this also holds in situations where asymmetric belief updating is suboptimal. Future work should aim to characterize whether in-context learning also displays asymmetric belief updating in such situations.