Grading Automated Essay Scoring Programs- Part III: Classrooms (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Justin Reich

Assistant Professor of Digital Media and Director of the Teaching Systems Lab, MIT

The Backstory

I arrive at the final of my three part series of Automated Essay Score Predictors. The series was sparked by a study, funded by the Hewlett Foundation, showing computer programs to be as reliable as humans in grading essays, and a subsequent conversation with Will Richardson and Vantage Learning. Part I of the series explained how these tools work; Part II made a case for how they could allow for better assessments which would drive richer classroom experiences.

I have been trying in this series, to explain how these tools work and then go on to suggest how they might benefit a progressive vision of student learning. Automated Graders seem to inspire a level of antipathy which educators once reserved for spellcheck, blogs, Wikipedia, and Twitter. As with automated graders, these are all technology tools which educators once widely believed (maybe still widely believe for Twitter) would cheapen and degrade communication and human writing. My sense is that “robo-graders” provoke a natural sense of revulsion to most educators since they disrupt closely held beliefs about writing, but it’s worth trying to explore what it would mean to leverage these tools to promote better learning outcomes.

Part of this post is meant to address two posts by my much-admired colleagues Bud Hunt and Audrey Watters. Audrey asks:

“Robots can give a grade. That grade, according to the study by Shermis and Hamner, approximates that of a human. But a human can offer so much more than just a grade. And without feedback - substantive, smart, caring, thoughtful, difficult feedback - what’s the point?”

In this final post, I offer a scenario of how Automated Essay Score Predictors could be used in a progressive history course in an elite private school.

An Assignment from a Progressive History Classroom

When I taught history at the Noble and Greenough School, I helped redesign their freshman world history course. We changed it from a deathmarch through the textbook to an exercise where we traced the identities of partisans in three historical conflicts--in Israel/Palestine, Indian Independence, and Bosnia--back to their historical roots. At the end of the course, students had to choose another modern conflict and create teaching units that could be used in future years.

The first unit on Israel/Palestine culminated in a simulated peace negotiation between Israelis and Palestinians. To prepare, one exercise along the way to have students read the Balfour Declaration, and then write two paragraphs, one from the perspective of an Arab Muslim and one from the perspective of a Jewish settler about their reactions to the declaration. It’s a fun assignment to evaluate--one of the first chances in the year that I get to help students nurture their perspective taking skills. With my 80 students, if I evaluate each submission for 3 minutes, it will take me 240 minutes, or four hours. In three minutes I can assign each student a score (say from 1-6) and I can write maybe a 1 sentence comment and a few marginalia. At a huge cost of time (4 prep periods, two evenings, half a weekend day), I can provide each student a little bit of feedback.

Formative Feedback with Robo-Graders

Let’s say I restructure my feedback plan slightly to take advantage of Automated Essay Score Predictors, although in this scenario I don’t get any benefits until year 2.

Before I evaluate the essays, I’m going to craft six messages that I anticipate having to use to give feedback. One might be “This paragraph starts with a fact. In short expository writing, it’s often more effective to start with your argument, and then support that argument with evidence.” Another might be, “You make a clear argument here, but you need to support your assertions with evidence from the Balfour Declaration and your knowledge of the period.” A third might be, “It is not clear what the argument of this paragraph is. Re-read the paragraph, and try to craft a single sentence that summarizes the key point you are trying to convey.”

I have students submit their essays to the Lightside add-in for Moodle (this doesn’t exist yet, but is technically very feasible). Lightside is an open-source, free, automated essay scoring tool. When I evaluate student essays, I give them their 1-6 grade, check any of the six relevant boxes for the pre-scripted feedback, and write any additional comments that I’d like to make. In year 1, this feedback is all that students get.

Fast forward to year 2. Students do the same assignment (my curriculum evolves from year to year, but good stuff is retained). They submit the assignment to the Lightside add-in for Moodle, but this year, something very different happens. Lightside uses my feedback from last year to provide immediate feedback to this year’s students. Upon receiving a student submission, Lightside instantly sends a message saying something like “Essays similar to this one earned a 4/6 on this assignment. Essays similar to this one also received the following feedback: ‘You make a clear argument here, but you need to support your assertions with evidence from the Balfour declaration and your knowledge of the period.’ Please review your submission, and see if this feedback helps you improve it.” Instead of waiting a minimum of 3 days for feedback from me, students instantly get advice they can use to sharpen their writing.

Not only do the students receive instant feedback on their submissions, but as an instructor, I receive a report that details the overall performance of the students. Perhaps my report indicates that 51 out of 80 students received the message: “This paragraph starts with a fact. In short expository writing, it’s often more effective to start with your argument, and then support that argument with evidence.”

For many students, they will have some sense of how they might respond to that feedback, but many won’t know what to do. So in class the following day, we do two things. First, I have a short mini-lesson on writing topic sentences for paragraphs, where I offer some general principles and workshop a couple examples on the board. I video record this mini-lesson, so that in year 3 when students get this feedback, the Lightside add-in will link to this mini-lesson when it gives the related feedback. Then, I give students 10 minutes to peer edit each other’s topic sentences. Finally, I tell them all to revise their paragraphs and resubmit them for homework.

Now, when those final submissions come in, I still grade all of them for 3 minutes each, 240 minutes total, but that time is spent totally differently. Students have used my algorithmically-generated feedback to improve their pieces beyond the achievements of the previous years’ students. Having used technology to coach students algorithmically, I now use my specialized experience as a teacher to continue pushing students to the next level. This is exactly how I have my students use spellcheck, grammar check, plagiarism check, and other algorithmic tools now. It’s worth noting that many educators decried the creation of spellcheck and grammar check as degrading student writing, but nearly every professional writer depends upon these two tools and most educators expect students to use these algorithmic writing coaches in their practice, since they allow students to focus on more cognitively demanding editorial routines.

What Role would Automated Feedback Play in for Teachers with over 150 Students?

Now, I have little interest in supporting student writing in elite private schools. Those kids are fine. I’m much more interested in supporting my colleagues in Tennessee, where I recently gave a talk to history educators, who routinely teach 5-6 sections of 35-40 students each. If those teachers want to spend five minutes evaluating student work, then they need to invest 1200 minutes, or 20 hours (5 min/student; 200 min/class; 1200min/6 classes). That’s half a working week to provide a student with 5 minutes of individualized coaching. The protocols like the ones I’ve described here could radically improve the level of feedback that students get in those classes. (Also, I’m all for advocating for doubling per-pupil-expenditures, reducing teacher-student ratios, etc. But in the meantime...)

Writing touches teachers’ hearts. It is at the core of our work with students. Educators react with initial revulsion to technologies that change that core. My initial reaction to “robo-graders” is revulsion as well. But when I set that revulsion aside, I see the same possibility inherent in the best applications of technology--the ability to push teaching and learning further towards a student-centered vision of learning by reducing the transactional costs of various forms of communication.

The opinions expressed in EdTech Researcher are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Grading Automated Essay Scoring Programs- Part III: Classrooms

Sign Up for EdWeek Update