What question did this study set out to answer?

This research aims to evaluate the effectiveness of the WriteToLearn tool in scoring essays and providing feedback for Chinese undergraduate English majors.

January 8, 2016

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of WriteToLearn

Key Points

This research aims to evaluate the effectiveness of the WriteToLearn tool in scoring essays and providing feedback for Chinese undergraduate English majors.
163 second-year English majors participated, writing 326 essays across two prompts.
Essays were scored by WriteToLearn and four trained human raters, using many-facet Rasch measurement.
Feedback accuracy was analyzed on a subset of 60 randomly selected essays compared to human raters.
WriteToLearn showed more consistency in scoring but was overly stringent compared to human raters.
It failed to score 7 essays, indicating significant limitations in its scoring system.
Error detection rates were low, with precision at 49% and recall at 18.7%, not meeting the 90% threshold for reliability.

Abstract

This study investigated the application of WriteToLearn on Chinese undergraduate English majors’ essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was marked by four human raters as well as WriteToLearn. Many-facet Rasch measurement (MFRM) was conducted to calibrate WriteToLearn’s rating performance in scoring the whole set of essays against those of four trained human raters. The accuracy of WriteToLearn’s feedback on 60 randomly selected essays was compared with the feedback provided by human raters. The two main findings related to scoring were that WriteToLearn was more consistent but highly stringent relative to the four trained human raters in scoring essays and that it failed to score 7 essays. In terms of error feedback, WriteToLearn had an overall precision and recall of 49% and 18.7% respectively. These figures did not meet the minimum threshold of 90% precision for it to be a reliable error detecting tool set by Burstein, Chodorow, and Leacock (2003). Furthermore, it had difficulty in identifying the errors made by Chinese undergraduate English majors in the use of articles, prepositions, word choice and expression.

Mark Helpful

Bookmark

Relay