AI Meets the Classroom: When Does ChatGPT Harm Learning?

Paper · arXiv 2409.09047 · Published August 29, 2024

In this paper, we study how generative AI and specifically large language models (LLMs) impact learning in coding classes. We show across three studies that LLM usage can have positive and negative effects on learning outcomes. Using observational data from university- level programming courses, we establish such effects in the field. We replicate these findings in subsequent experimental studies, which closely resemble typical learning scenarios, to show causality. We find evidence for two contrasting mechanisms that determine the overall effect of LLM usage on learning. Students who use LLMs as personal tutors by conversing about the topic and asking for explanations benefit from usage. However, learning is impaired for students who excessively rely on LLMs to solve practice exercises for them and thus do not invest sufficient own mental effort. Those who never used LLMs before are particularly prone to such adverse behavior. Students without prior domain knowledge gain more from having access to LLMs. Finally, we show that the self-perceived benefits of using LLMs for learning exceed the actual benefits, potentially resulting in an overestimation of one’s own abilities.

Introduction. Unlike previous AI tools, generative AI is a general-purpose technology that can perform a wide range of knowledge tasks that it was not specifically trained on. This especially applies to large language models (LLMs), the most prominent instance of generative AI, which generate text conditioned on user prompts. While LLMs are not yet good enough to perform many knowledge tasks autonomously, workers supported by LLMs significantly increase their productivity (Brynjolfsson et al. 2023, Dell’Acqua et al. 2023). This is particularly true in education, where students now have round-the-clock access to a “personal tutor”, who answers questions, helps with homework, writes summaries, and clarifies difficult concepts. However, learning-by-doing, that is deep personal engagement with the problems at hand, is a key part of knowledge work, including education (e.g., Narayanan et al. 2009, Staats and Gino 2012, Smilowitz and Keppler 2020). If LLMs replace some parts of that engagement, learning and subsequent performance may be impeded.

Discussion / Conclusion. The findings from our studies can be summarized as follows. We find two contrasting effects of LLMs on student learning and thus reveal a more nuanced picture than concurrent research (e.g., Bastani et al. 2024, Nie et al. 2024) has depicted. Using LLMs as personal tutors by asking them for explanations improves learning outcomes whereas excessively asking LLMs to generate solutions impairs learning. By manipulating copy-and-paste availability, we show that copy-and-paste specifically enables this latter adverse behavior. The overall effect of LLM usage on learning is the result of a delicate balance between both mechanisms, relying on LLM-generated solutions and using LLMs as personal tutors, and can come out in any direction depending on the specific case. Further, students are affected asymmetrically based on prior experience. Beginners, i.e. students without prior domain-specific knowledge, benefit more from LLM usage.

AI Meets the Classroom: When Does ChatGPT Harm Learning?

Synthesis notes that discuss concepts related to this paper