Automated writing evaluation for formative second language assessment: Exploring performance, teacher use, and student engagement
Abstract
The purpose of this dissertation is to investigate automated writing evaluation (AWE) from both system- and user-centric perspectives. The system-centric research focused on error-correction/detection performance of the AWE system, Grammarly. The study was based on fifty-three argumentative essay drafts written by undergraduate students enrolled in a second language (L2) writing course. Grammarly’s feedback given to those essay drafts was measured using precision (accuracy) and recall (system coverage) and compared to human annotators' feedback. Results revealed that Grammarly’s precision rates for flagging and correction (92% and 91%, respectively) exceeded a benchmark of 80%. This means that Grammarly was accurate in detecting and correcting common L2 errors. However, Grammarly’s recall rate was low (51%), which means that Grammarly missed half of the errors found by human annotators. Two user-centric studies focused on teachers and students. The first study explored six postsecondary, L2 writing teachers’ use and perceptions of Grammarly as a complement to their feedback. The participants’ feedback was analyzed to understand Grammarly’s impact on their feedback activity. The participants then had a semi-structured interview aimed at exploring their perceptions of Grammarly as a supplementary tool. Findings revealed that despite using Grammarly to complement their feedback, teachers still provided feedback on sentence-level issues. Overall, the majority of teachers were positive about using Grammarly to complement their feedback, notwithstanding its limitations. The second study explored two English as a second language (ESL) college students’ behavioral, cognitive, and affective engagement with Grammarly’s feedback when revising a final draft. The behavioral engagement was explored through the analysis of QuickTime-based screencasts of students’ Grammarly usage. Cognitive and affective engagement were measured through the analysis of students’ comments during stimulated recall of the aforementioned screencasts and semi-structured interviews. According to findings, one student showed greater cognitive engagement through his questioning of AWCF but did little to verify the accuracy of feedback, which resulted in moderate changes to his draft. The other’s overreliance on AWCF indicated more limited cognitive engagement, which led to feedback’s blind acceptance. Nevertheless, this also resulted in moderate changes to her draft. The dissertation provides implications to meaningfully use AWE in L2 writing classrooms.
Collections
- OSU Dissertations [11222]