Fairness in Automated Essay Scoring: A comparative analysis of algorithms on German learner essays from secondary education
Aufsatz in Konferenzband › Forschung › begutachtet
Publikationsdaten
Von | Nils-Jonathan Schaller, Yuning Ding, Andrea Horbach, Jennifer Meyer, Thorben Jansen |
Originalsprache | Englisch |
Erschienen in | Ekaterina Kochmar, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan (Hrsg.), Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024) |
Seiten | 210-221 |
Herausgeber (Verlag) | Association for Computational Linguistics |
ISBN | 979-8-89176-100-1 |
DOI/Link | https://aclanthology.org/2024.bea-1.18/ |
Publikationsstatus | Veröffentlicht – 06.2024 |
Pursuing educational equity, particularly in writing instruction, requires that all students receive fair (i.e., accurate and unbiased) assessment and feedback on their texts. Automated Essay Scoring (AES) algorithms have so far focused on optimizing the mean accuracy of their scores and paid less attention to fair scores for all subgroups, although research shows that students receive unfair scores on their essays in relation to demographic variables, which in turn are related to their writing competence. We add to the literature arguing that AES should also optimize for fairness by presenting insights on the fairness of scoring algorithms on a corpus of learner texts in the German language and introduce the novelty of examining fairness on psychological and demographic differences in addition to demographic differences. We compare shallow learning, deep learning, and large language models with full and skewed subsets of training data to investigate what is needed for fair scoring. The results show that training on a skewed subset of higher and lower cognitive ability students shows no bias but very low accuracy for students outside the training set. Our results highlight the need for specific training data on all relevant user groups, not only for demographic background variables but also for cognitive abilities as psychological student characteristics.