Ed-Tech Policy

States Testing Computer-Scored Essays

By Andrew Trotter — May 29, 2002 | Corrected: June 12, 2002 7 min read
  • Save to favorites
  • Print

Corrected: This story originally gave an incorrect first name for the spokeswoman for the Pennsylvania Department of Education. She is Beth Gaydos.

Could a computer really be a good judge of student writing?

Pennsylvania education officials say yes. They have tested computerized essay scoring with about 30,000 students. Meanwhile, in Indiana, about 29,000 students are participating this spring in a pilot test of online essay-grading software designed by the Educational Testing Service.

Other states—and many educators—are watching those developments to decide if they should consider using such technology.

“One of our goals was to see how online scoring compared to human scoring—they both ranked very equally,” said Beth Gaydos, a spokeswoman for the Pennsylvania Department of Education.

Still, some educators and testing experts caution that essay-scoring systems are far from perfect, and that using them to evaluate students on high-stakes exams could be a mistake.

Pennsylvania conducted three pilot tests, from 1999 to 2001, of the Intellimetric essay-scoring system, which was developed by Yardley, Pa.-based Vantage Learning. Students in grades 6, 9, and 11 used the Web-based system to take reading and writing tests.

As it is, the state has no immediate plans to replace paper-and-pencil testing with Web-based assessments, Ms. Gaydos said. She said such a decision would have to consider whether all schools have the computer capabilities to administer such tests.

Indiana is conducting a test this spring of a competing essay-grading tool called the “e- rater,” which was developed by the ETS, based in Princeton, N.J. High school students whose schools volunteered for the trial were scheduled to take Indiana’s end-of-course test for English 11 online. That test is a mixture of multiple-choice items and essay questions.

Other states are watching the trial closely.

“We’re very excited about the potential” of essay-scoring technology, said Robert Olsen, the head of the online-assessment program for the Oregon Department of Education. Oregon is in the second year of pilot- testing a multiple-choice online assessment. (“Testing Computerized Exams,” May 23, 2001.)

Essay-scoring technology could soon be added to the Oregon system. “We are in the process of completing a study in Oregon to verify the reports of the vendor [Vantage Learning] in terms of its accuracy and utility,” Mr. Olson said, “and are very, very seriously looking at implementing it in this state.”

The Massachusetts Department of Education has also announced a test of an online writing-analysis tool that uses the Vantage Learning engine through the state’s “Virtual Education Space,” a Web site devoted to preparing students for state-sponsored assessments.

Testing the Software

If they prove effective, the new tools could have many benefits, some educators and policymakers say. Lessening the reliance on human scorers would reduce costs, for instance, and could help avert a possible shortage of scorers when state and federal mandates strain the capacity of testing programs over the next few years.

Some experts also argue that the tools could help improve online-testing systems that rely on multiple- choice questions, because tests with essay items are generally regarded as a more complete measure of student abilities than tests with multiple-choice items alone.

And online, computer-scored tests can return results to schools almost instantly, helping educators address students’ academic weaknesses soon after they’re spotted. Educators say it often takes months to get the results of paper tests.

ETS Technologies, the for-profit subsidiary of the nonprofit developer of the SAT college-entrance exam, approached the Indiana education department in January of this year and offered to set up a small pilot for online assessment, said Wes Bruce, the department’s director of the division of school assessment.

Indiana officials asked for a large-scale statewide trial that would use not the Indiana Statewide Testing for Educational Progress, the state’s high-stakes academic test, but the Core 40, a set of tests that the state has devised to get a sense of how students are performing in core academic courses. Those voluntary tests will become mandatory over the next few years.

“If you look at our [state educational accountability law], see all of its components, and the timeline for rolling it out, it will become particularly obvious why we piloted online testing this year,” said Mary Tiede Wilhelmus, the communications director of the state education department.

Human vs. Machine

People hired to score student essays typically have a four-year college degree and good writing skills, said Alison Lyden, an official at Data Recognition Corp., a testing company in Maple Grove, Minn. She said scorers, who are paid about $12 an hour, are trained before scoring student essays. And two people usually score each test independently.

Still, officials from the testing-technology companies suggest that the essay-scoring software can match the human scorers.

Generally, the computer scores a student response by comparing it with hundreds of human-scored responses to the same test item. If it looks most like a response that human experts have given, say, a 5 on a 1-to-5 scale, then the machine will assign it a 5.

The Intellimetric engine used in Pennsylvania is prepped by scanning in thousands of test items, said Scott Elliot, the chief operating officer of Vantage Learning, adding that he prefers to have 300 scored responses for each item on a test. “By learning the characteristics of 300 typical responses, it can apply that learning to score a novel response,” he said.

Once primed, the software looks for patterns in about 76 different features of the responses, some of which might not be readily discernible to every human scorer, the company maintains.

Some are structural, mechanical elements, such as spelling, punctuation, syntax, and subject-verb agreement. Other features involve content— “concepts and relationships among those concepts,” said Mr. Elliot.

“It ultimately comes down to vocabulary,” he said.

All those patterns, layered together and anchored in the human-scored samples, create an effective scorer, Mr. Elliot argued.

“The bottom line,” he said, “is our engine typically matches [human] experts more often than two [human] experts can match each other.”

And, the computer “doesn’t need a cigarette break, doesn’t need a cup of coffee, and scores the first and last essay the same,” he said.

The essay-scoring engine created by Knowledge Analysis Technologies uses another analytical method, called “latent semantic analysis,” that is based on a broader model of English, said Lynn A. Streeter, the business-development officer of the company, based in Boulder, Colo.

It involves creating three lexicons, or collections of words: The first is a general model of English for the typical test-taker, such as a college freshman; the second is words pertaining to the subject of the test; the third is specific to each essay question, she said.

Ms. Streeter claims that having the first “general semantic space” allows the computer to recognize student responses that might be further afield from the average. For example, she said, if the word “doctor” was consistently used in a sample essay question, “then somebody writes a test essay in which they refer to a dermatologist, in our model we’d know that it’s very close to doctor and essentially means almost same thing.”

Potential Problems

But the use of essay-scoring software faces some big hurdles before becoming a part of state or federally mandated academic assessments. For starters, the uneven availability of computers and high-speed Internet connections in schools is a problem.

In addition, several studies by Boston College researchers suggest that students perform better on essay tests when the test-delivery method—whether on paper or computer—is the same method they use for regular writing assignments.

For now, Ms. Streeter said, machine- scoring of essays is best used to grade practice tests or to help teachers wade through student writing exercises, which would allow them to assign more of them. “It should be more about helping a person, than ‘you flunk,’” she said.

For example, her company’s essay-scoring tool is used in a literacy project at the University of Colorado, called “Summary Street,” in which students in grades 3-12 write summaries of book chapters they have read. The computer gives feedback on how to improve their writing and concepts they have missed.

Michael K. Russell, a researcher at the Center for the Study of Testing, Evaluation, and Assessment, at Boston College, suggests that essay- scoring software might be best used as a diagnostic tool to analyze student essays to reveal misconceptions about academic topics.

Beyond that, Mr. Russell said, increased use of essay-scoring technologies must first be matched by more use of computers for student writing and classroom learning.

Coverage of technology is supported in part by the William and Flora Hewlett Foundation.

A version of this article appeared in the May 29, 2002 edition of Education Week as States Testing Computer-Scored Essays

Events

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Reading & Literacy Webinar
Unlocking Success for Struggling Adolescent Readers
The Science of Reading transformed K-3 literacy. Now it's time to extend that focus to students in grades 6 through 12.
Content provided by STARI
Jobs Regional K-12 Virtual Career Fair: DMV
Find teaching jobs and K-12 education jubs at the EdWeek Top School Jobs virtual career fair.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
College & Workforce Readiness Webinar
CTE for All: How One School Board Builds Future-Ready Students
Discover how CPSB uses partnerships and high-quality digital resources to build equitable, future-ready CTE pathways for every student.
Content provided by Cengage School

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Ed-Tech Policy Nation's 2nd Largest District Moves to Limit Student Screen Use
LAUSD will limit classroom screen time, emphasizing quality learning over device use.
Photos of board members decorate the walls inside LAUSD headquarters Wednesday, Feb. 25, 2026, in Los Angeles.
Photos of board members decorate the walls inside LAUSD headquarters Wednesday, Feb. 25, 2026, in Los Angeles. The Los Angeles Board of Education recently voted to limit screen time in classrooms.
Damian Dovarganes/AP
Ed-Tech Policy Letter to the Editor Don’t Ban Phones, Limit Them
Phones can be useful tools, says a high school student.
1 min read
Education Week opinion letters submissions
Gwen Keraval for Education Week
Ed-Tech Policy Welcome to the 'Funky' Politics of the Tech in Schools Debate
The Trump administration is cheerleading AI in schools as GOP lawmakers crack down on ed tech.
9 min read
In this Oct. 5, 1980, file photo, Nancy Armstrong, a teacher at the Marshall elementary school in Harrisburg, Pa., assists her students in the use of computers to aid them in their studies. Today’s grandparents may have fond memories of the “good old days,” but history tells us that adults have worried about their kids’ fascination with new-fangled entertainment and technology since the days of dime novels, radio, the first comic books and rock n’ roll.
In this Oct. 5, 1980, file photo, Nancy Armstrong, a teacher at Marshall Elementary School in Harrisburg, Pa., assists her students in the use of computers to aid them in their learning. The debate about how much time students should spend using technology to learn has been around for decades, but is now heating up in Congress and state legislatures and creating some unlikely allies.
Paul Vathis/AP
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Ed-Tech Policy Whitepaper
Solving Inconsistent Phone Policy Enforcement: Teacher Insights
This white paper helps school leaders make informed decisions by highlighting teachers’ phone enforcement experiences and how technology ...
Content provided by TRUCE Family