Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students' text revision, motivation and positive emotions

Journal articleResearchPeer reviewed

Publication data


ByJennifer Meyer, Thorben Jansen, Ronja Schiller, Lucas Wilhelm Liebenow, Marlene Hanna Steinbach, Andrea Horbach, Johanna Fleckenstein
Original languageEnglish
Published inComputers & Education: Artificial Intelligence, 6, Article 100199
Editor (Publisher)Elsevier
ISSN2666-920X
DOI/Linkhttps://doi.org/10.1016/j.caeai.2023.100199 (Open Access)
Publication statusPublished – 06.2024

Writing proficiency is an essential skill for upper secondary students that can be enhanced through effective feedback. Creating feedback on writing tasks, however, is time-intensive and presents a challenge for educators, often resulting in students receiving insufficient or no feedback. The advent of text-generating large language models (LLMs) offers a promising solution, namely, automated evidence-based feedback generation. Yet, empirical evidence from randomized controlled studies about the effectiveness of LLM-generated feedback is missing. To address this issue, the current study compared the effectiveness of LLM-generated feedback to no feedback. A sample of N = 459 upper secondary students of English as a foreign language wrote an argumentative essay. Students in the experimental group were asked to revise their text according to feedback that was generated using the LLM GPT-3.5-turbo. The control group revised their essays without receiving feedback. We assessed improvement in the revision using automated essay scoring. The results showed that LLM-generated feedback increased revision performance (d = .19) and task motivation (d = 0.36). Moreover, it increased positive emotions (d = 0.34) compared to revising without feedback. The findings highlight that using LLMs allows to create timely feedback that can positively relate to students’ cognitive and affective-motivational outcomes. Future perspectives and the implications for research and practice of using LLM-generated feedback in intelligent tutoring systems are discussed.