Special Report
Assessment

Essay-Grading Software Seen as Time-Saving Tool

By Caralee J. Adams — March 10, 2014 7 min read
  • Save to favorites
  • Print

Jeff Pence knows the best way for his 7th grade English students to improve their writing is to do more of it. But with 140 students, it would take him at least two weeks to grade a batch of their essays.

So the Canton, Ga., middle school teacher uses an online, automated essay-scoring program that allows students to get feedback on their writing before handing in their work.

“It doesn’t tell them what to do, but it points out where issues may exist,” said Mr. Pence, who says the a Pearson WriteToLearn program engages the students almost like a game.

With the technology, he has been able to assign an essay a week and individualize instruction efficiently. “I feel it’s pretty accurate,” Mr. Pence said. “Is it perfect? No. But when I reach that 67th essay, I’m not real accurate, either. As a team, we are pretty good.”

With the push for students to become better writers and meet the new Common Core State Standards, teachers are eager for new tools to help out. Pearson, which is based in London and New York City, is one of several companies upgrading its technology in this space, also known as artificial intelligence, AI, or machine-reading. New assessments to test deeper learning and move beyond multiple-choice answers are also fueling the demand for software to help automate the scoring of open-ended questions.

Critics contend the software doesn’t do much more than count words and therefore can’t replace human readers, so researchers are working hard to improve the software algorithms and counter the naysayers.

While the technology has been developed primarily by companies in proprietary settings, there has been a new focus on improving it through open-source platforms. New players in the market, such as the startup venture LightSide and edX, the nonprofit enterprise started by Harvard University and the Massachusetts Institute of Technology, are openly sharing their research. Last year, the William and Flora Hewlett Foundation sponsored an open-source competition to spur innovation in automated writing assessments that attracted commercial vendors and teams of scientists from around the world. (The Hewlett Foundation supports coverage of “deeper learning” issues in Education Week.)

“We are seeing a lot of collaboration among competitors and individuals,” said Michelle Barrett, the director of research systems and analysis for CTB/McGraw-Hill, which produces the Writing Roadmap for use in grades 3-12. “This unprecedented collaboration is encouraging a lot of discussion and transparency.”

Mark D. Shermis, an education professor at the University of Akron, in Ohio, who supervised the Hewlett contest, said the meeting of top public and commercial researchers, along with input from a variety of fields, could help boost performance of the technology. The recommendation from the Hewlett trials is that the automated software be used as a “second reader” to monitor the human readers’ performance or provide additional information about writing, Mr. Shermis said.

“The technology can’t do everything, and nobody is claiming it can,” he said. “But it is a technology that has a promising future.”

‘Hot Topic’

The first automated essay-scoring systems go back to the early 1970s, but there wasn’t much progress made until the 1990s with the advent of the Internet and the ability to store data on hard-disk drives, Mr. Shermis said. More recently, improvements have been made in the technology’s ability to evaluate language, grammar, mechanics, and style; detect plagiarism; and provide quantitative and qualitative feedback.

The computer programs assign grades to writing samples, sometimes on a scale of 1 to 6, in a variety of areas, from word choice to organization. The products give feedback to help students improve their writing. Others can grade short answers for content. To save time and money, the technology can be used in various ways on formative exercises or summative tests.

The Educational Testing Service first used its e-rater automated-scoring engine for a high-stakes exam in 1999 for the Graduate Management Admission Test, or GMAT, according to David Williamson, a senior research director for assessment innovation for the Princeton, N.J.-based company. It also uses the technology in its Criterion Online Writing Evaluation Service for grades 4-12.

Over the years, the capabilities changed substantially, evolving from simple rule-based coding to more sophisticated software systems. And statistical techniques from computational linguists, natural language processing, and machine learning have helped develop better ways of identifying certain patterns in writing.

But challenges remain in coming up with a universal definition of good writing, and in training a computer to understand nuances such as “voice.”

In time, with larger sets of data, more experts can identify nuanced aspects of writing and improve the technology, said Mr. Williamson, who is encouraged by the new era of openness about the research.

“It’s a hot topic,” he said. “There are a lot of researchers and academia and industry looking into this, and that’s a good thing.”

High-Stakes Testing

In addition to using the technology to improve writing in the classroom, West Virginia employs automated software for its statewide annual reading language arts assessments for grades 3-11. The state has worked with CTB/McGraw-Hill to customize its product and train the engine, using thousands of papers it has collected, to score the students’ writing based on a specific prompt.

“We are confident the scoring is very accurate,” said Sandra Foster, the lead coordinator of assessment and accountability in the West Virginia education office, who acknowledged facing skepticism initially from teachers. But many were won over, she said, after a comparability study showed that the accuracy of a trained teacher and the scoring engine performed better than two trained teachers. Training involved a few hours in how to assess the writing rubric. Plus, writing scores have gone up since implementing the technology.

Automated essay scoring is also used on the ACT Compass exams for community college placement, the new Pearson General Educational Development tests for a high school equivalency diploma, and other summative tests. But it has not yet been embraced by the College Board for the SAT or the rival ACT college-entrance exams.

The two consortia delivering the new assessments under the Common Core State Standards are reviewing machine-grading but have not committed to it.

Jeffrey Nellhaus, the director of policy, research, and design for the Partnership for Assessment of Readiness for College and Careers, or PARCC, wants to know if the technology will be a good fit with its assessment, and the consortium will be conducting a study based on writing from its first field test to see how the scoring engine performs.

Likewise, Tony Alpert, the chief operating officer for the Smarter Balanced Assessment Consortium, said his consortium will evaluate the technology carefully.

Open-Source Options

With his new company LightSide, in Pittsburgh, owner Elijah Mayfield said his data-driven approach to automated writing assessment sets itself apart from other products on the market.

“What we are trying to do is build a system that instead of correcting errors, finds the strongest and weakest sections of the writing and where to improve,” he said. “It is acting more as a revisionist than a textbook.”

The new software, which is available on an open-source platform, is being piloted this spring in districts in Pennsylvania and New York.

In higher education, edX has just introduced automated software to grade open-response questions for use by teachers and professors through its free online courses. “One of the challenges in the past was that the code and algorithms were not public. They were seen as black magic,” said company President Anant Argawal, noting the technology is in an experimental stage. “With edX, we put the code into open source where you can see how it is done to help us improve it.”

Still, critics of essay-grading software, such as Les Perelman, want academic researchers to have broader access to vendors’ products to evaluate their merit. Now retired, the former director of the MIT Writing Across the Curriculum program has studied some of the devices and was able to get a high score from one with an essay of gibberish.

“My main concern is that it doesn’t work,” he said. While the technology has some limited use with grading short answers for content, it relies too much on counting words and reading an essay requires a deeper level of analysis best done by a human, contended Mr. Perelman.

“The real danger of this is that it can really dumb down education,” he said. “It will make teachers teach students to write long, meaningless sentences and not care that much about actual content.”

Related Tags:

Events

Jobs Virtual Career Fair for Teachers and K-12 Staff
Find teaching jobs and other jobs in K-12 education at the EdWeek Top School Jobs virtual career fair.
Ed-Tech Policy Webinar Artificial Intelligence in Practice: Building a Roadmap for AI Use in Schools
AI in education: game-changer or classroom chaos? Join our webinar & learn how to navigate this evolving tech responsibly.
Education Webinar Developing and Executing Impactful Research Campaigns to Fuel Your Ed Marketing Strategy 
Develop impactful research campaigns to fuel your marketing. Join the EdWeek Research Center for a webinar with actionable take-aways for companies who sell to K-12 districts.

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Assessment The 5 Burning Questions for Districts on Grading Reforms
As districts rethink grading policies, they consider the purpose of grades and how to make them more reliable measures of learning.
5 min read
Grading reform lead art
Illustration by Laura Baker/Education Week with E+ and iStock/Getty
Assessment As They Revamp Grading, Districts Try to Improve Consistency, Prevent Inflation
Districts have embraced bold changes to make grading systems more consistent, but some say they've inflated grades and sent mixed signals.
10 min read
Close crop of a teacher's hands grading a stack of papers with a red marker.
E+
Assessment Opinion What's the Best Way to Grade Students? Teachers Weigh In
There are many ways to make grading a better, more productive experience for students. Here are a few.
14 min read
Images shows colorful speech bubbles that say "Q," "&," and "A."
iStock/Getty
Assessment Spotlight Spotlight on Assessment
This Spotlight will help you evaluate effective ways to offer students feedback, learn how to improve assessments for ELs, and more.