Ed-Tech Policy

States Testing Computer-Scored Essays

By Andrew Trotter — May 29, 2002 7 min read

Could a computer really be a good judge of student writing?

Pennsylvania education officials say yes. They have tested computerized essay scoring with about 30,000 students. Meanwhile, in Indiana, about 29,000 students are participating this spring in a pilot test of online essay-grading software designed by the Educational Testing Service.

Other states—and many educators—are watching those developments to decide if they should consider using such technology.

“One of our goals was to see how online scoring compared to human scoring—they both ranked very equally,” said Mary Gaydos, a spokeswoman for the Pennsylvania Department of Education.

Still, some educators and testing experts caution that essay-scoring systems are far from perfect, and that using them to evaluate students on high-stakes exams could be a mistake.

Pennsylvania conducted three pilot tests, from 1999 to 2001, of the Intellimetric essay-scoring system, which was developed by Yardley, Pa.-based Vantage Learning. Students in grades 6, 9, and 11 used the Web-based system to take reading and writing tests.

As it is, the state has no immediate plans to replace paper-and-pencil testing with Web-based assessments, Ms. Gaydos said. She said such a decision would have to consider whether all schools have the computer capabilities to administer such tests.

Indiana is conducting a test this spring of a competing essay-grading tool called the “e- rater,” which was developed by the ETS, based in Princeton, N.J. High school students whose schools volunteered for the trial were scheduled to take Indiana’s end-of-course test for English 11 online. That test is a mixture of multiple-choice items and essay questions.

Other states are watching the trial closely.

“We’re very excited about the potential” of essay-scoring technology, said Robert Olsen, the head of the online-assessment program for the Oregon Department of Education. Oregon is in the second year of pilot- testing a multiple-choice online assessment. (“Testing Computerized Exams,” May 23, 2001.)

Essay-scoring technology could soon be added to the Oregon system. “We are in the process of completing a study in Oregon to verify the reports of the vendor [Vantage Learning] in terms of its accuracy and utility,” Mr. Olson said, “and are very, very seriously looking at implementing it in this state.”

The Massachusetts Department of Education has also announced a test of an online writing-analysis tool that uses the Vantage Learning engine through the state’s “Virtual Education Space,” a Web site devoted to preparing students for state-sponsored assessments.

Testing the Software

If they prove effective, the new tools could have many benefits, some educators and policymakers say. Lessening the reliance on human scorers would reduce costs, for instance, and could help avert a possible shortage of scorers when state and federal mandates strain the capacity of testing programs over the next few years.

Some experts also argue that the tools could help improve online-testing systems that rely on multiple- choice questions, because tests with essay items are generally regarded as a more complete measure of student abilities than tests with multiple-choice items alone.

And online, computer-scored tests can return results to schools almost instantly, helping educators address students’ academic weaknesses soon after they’re spotted. Educators say it often takes months to get the results of paper tests.

ETS Technologies, the for-profit subsidiary of the nonprofit developer of the SAT college-entrance exam, approached the Indiana education department in January of this year and offered to set up a small pilot for online assessment, said Wes Bruce, the department’s director of the division of school assessment.

Indiana officials asked for a large-scale statewide trial that would use not the Indiana Statewide Testing for Educational Progress, the state’s high-stakes academic test, but the Core 40, a set of tests that the state has devised to get a sense of how students are performing in core academic courses. Those voluntary tests will become mandatory over the next few years.

“If you look at our [state educational accountability law], see all of its components, and the timeline for rolling it out, it will become particularly obvious why we piloted online testing this year,” said Mary Tiede Wilhelmus, the communications director of the state education department.

Human vs. Machine

People hired to score student essays typically have a four-year college degree and good writing skills, said Alison Lyden, an official at Data Recognition Corp., a testing company in Maple Grove, Minn. She said scorers, who are paid about $12 an hour, are trained before scoring student essays. And two people usually score each test independently.

Still, officials from the testing-technology companies suggest that the essay-scoring software can match the human scorers.

Generally, the computer scores a student response by comparing it with hundreds of human-scored responses to the same test item. If it looks most like a response that human experts have given, say, a 5 on a 1-to-5 scale, then the machine will assign it a 5.

The Intellimetric engine used in Pennsylvania is prepped by scanning in thousands of test items, said Scott Elliot, the chief operating officer of Vantage Learning, adding that he prefers to have 300 scored responses for each item on a test. “By learning the characteristics of 300 typical responses, it can apply that learning to score a novel response,” he said.

Once primed, the software looks for patterns in about 76 different features of the responses, some of which might not be readily discernible to every human scorer, the company maintains.

Some are structural, mechanical elements, such as spelling, punctuation, syntax, and subject-verb agreement. Other features involve content— “concepts and relationships among those concepts,” said Mr. Elliot.

“It ultimately comes down to vocabulary,” he said.

All those patterns, layered together and anchored in the human-scored samples, create an effective scorer, Mr. Elliot argued.

“The bottom line,” he said, “is our engine typically matches [human] experts more often than two [human] experts can match each other.”

And, the computer “doesn’t need a cigarette break, doesn’t need a cup of coffee, and scores the first and last essay the same,” he said.

The essay-scoring engine created by Knowledge Analysis Technologies uses another analytical method, called “latent semantic analysis,” that is based on a broader model of English, said Lynn A. Streeter, the business-development officer of the company, based in Boulder, Colo.

It involves creating three lexicons, or collections of words: The first is a general model of English for the typical test-taker, such as a college freshman; the second is words pertaining to the subject of the test; the third is specific to each essay question, she said.

Ms. Streeter claims that having the first “general semantic space” allows the computer to recognize student responses that might be further afield from the average. For example, she said, if the word “doctor” was consistently used in a sample essay question, “then somebody writes a test essay in which they refer to a dermatologist, in our model we’d know that it’s very close to doctor and essentially means almost same thing.”

Potential Problems

But the use of essay-scoring software faces some big hurdles before becoming a part of state or federally mandated academic assessments. For starters, the uneven availability of computers and high-speed Internet connections in schools is a problem.

In addition, several studies by Boston College researchers suggest that students perform better on essay tests when the test-delivery method—whether on paper or computer—is the same method they use for regular writing assignments.

For now, Ms. Streeter said, machine- scoring of essays is best used to grade practice tests or to help teachers wade through student writing exercises, which would allow them to assign more of them. “It should be more about helping a person, than ‘you flunk,’” she said.

For example, her company’s essay-scoring tool is used in a literacy project at the University of Colorado, called “Summary Street,” in which students in grades 3-12 write summaries of book chapters they have read. The computer gives feedback on how to improve their writing and concepts they have missed.

Michael K. Russell, a researcher at the Center for the Study of Testing, Evaluation, and Assessment, at Boston College, suggests that essay- scoring software might be best used as a diagnostic tool to analyze student essays to reveal misconceptions about academic topics.

Beyond that, Mr. Russell said, increased use of essay-scoring technologies must first be matched by more use of computers for student writing and classroom learning.

Coverage of technology is supported in part by the William and Flora Hewlett Foundation.

A version of this article appeared in the May 29, 2002 edition of Education Week as States Testing Computer-Scored Essays


This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
School & District Management Webinar
How Schools Can Implement Safe In-Person Learning
In order for in-person schooling to resume, it will be necessary to instill a sense of confidence that it is safe to return. BD is hosting a virtual panel discussing the benefits of asymptomatic screening
Content provided by BD
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Student Well-Being Webinar
How Districts Are Centering Relationships and Systemic SEL for Back to School 21-22
As educators and leaders consider how SEL fits into their reopening and back-to-school plans, it must go beyond an SEL curriculum. SEL is part of who we are as educators and students, as well as
Content provided by Panorama Education
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Student Achievement Webinar
The Fall K-3 Classroom: What the data imply about composition, challenges and opportunities
The data tracking learning loss among the nation’s schoolchildren confirms that things are bad and getting worse. The data also tells another story — one with serious implications for the hoped for learning recovery initiatives
Content provided by Campaign for Grade-Level Reading

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Ed-Tech Policy Q&A Acting FCC Chair: The 'Homework Gap' Is an 'Especially Cruel' Reality During the Pandemic
Under the new leadership of Jessica Rosenworcel, the FCC is exploring broadening the E-Rate to cover home-connectivity needs.
5 min read
Internet connectivity doesn't reach all the houses
Vanessa Solis/Education Week and iStock/Getty
Ed-Tech Policy Millions of Students Got Free Home Internet for Remote Learning. How Long Will It Last?
Time and money are running out on temporary agreements between districts and ISPs. Broadband advocates want a federal solution.
10 min read
Cupped hands hold a precious wi-fi symbol
Vanessa Solis/Education Week and Digital Vision Vectors/Getty
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Ed-Tech Policy Whitepaper
Using E-rate Funds to Enhance School Networks
This guide offers a roadmap to help K-12 leaders successfully leverage federal funds to expand digital learning opportunities for their students.
Content provided by Spectrum Enterprise
Ed-Tech Policy FCC Takes One Step Closer to Offering E-Rate Funds for Remote Learning Technology
Advocates have urged the FCC to loosen its rules on E-Rate funds so schools can pay for technology that helps students learn remotely.
2 min read
Andrew Burstein, 13, participates in a virtual class through Don Estridge High Tech Middle School in Delray Beach, Fla., this school year.
Andrew Burstein, 13, participates in a virtual class through Don Estridge High Tech Middle School in Delray Beach, Fla., this school year.
Joe Cavaretta/South Florida Sun-Sentinel via AP