More Errors Are Seen In the Scoring of Tests, Boston Researchers Say

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

right The report, “Errors in Standardized Tests: A Systemic Problem,” is available from The National Board on Educational Testing and Public Policy. (Requires Adobe’s Acrobat Reader.)

The number of human errors identified in standardized-test results has risen dramatically in recent years, calling into question whether such exams should be the only measure for such high-stakes decisions as school ranking, student graduation, and promotion to the next grade, says a report that examines those mistakes.

The fallout from inaccurate scores can mean that a qualified teacher is not certified, a competent student is not allowed to graduate, or a literate 3rd grader cannot enter 4th grade with her peers, said Kathleen Rhoades, an author of the report.

Released June 5 by the independent National Board for Educational Testing and Public Policy, based at Boston College’s school of education, the study found that on all standardized tests given in 1976, only one mistake by the testing industry was reported. That number jumped to 14 in 1999 and has stayed relatively steady since then, the report says.

Ms. Rhoades and co-author George Madaus, a senior fellow with the National Board on Educational Testing and Public Policy, worry that the numbers will climb higher as states scramble to satisfy the accountability demands in the “No Child Left Behind” Act of 2001.

Between the lack of oversight in the testing industry and the increased demands that will be placed on testing companies as a result of the federal law, the future “is ripe for the proliferation of undetected human error in educational testing,” the report warns.

More Tests, More Scrutiny

One reason for the apparent increase could simply be that more testing is taking place now, said Collin Earnst, a spokesman for the Boston-based Houghton Mifflin Co., one of the nation’s leading test publishers. “If we add another 5 million cars to the road, will we expect more car accidents? We probably would,” he said.

The proliferation of testing is not the only factor, however, according to Ms. Rhoades, a graduate research assistant at the education school. Test scores are scrutinized much more carefully now, she said, because they are used so prominently to make important decisions.

Using newspaper accounts and past research, the Boston College report chronicles some recent blunders made in the testing industry, and their dramatic effects.

For example, in 1999, a statistician with the Tennessee education department noticed that two-thirds of the state’s scores on the TerraNova test had decreased, and asked the testing company, CTB/McGraw-Hill, to review the scores, according to the report.

Eventually, the company discovered that “a programming error caused the percentile rankings on the TerraNova to be too low at the low end of the scale and too high at the high end of the scale,” the report says.

That lapse led to incorrect scores for roughly 250,000 students in six states.

Michael H. Kean, the vice president of public and governmental affairs for the Monterey, Calif.-based CTB/McGraw-Hill, said he had not read the report and thus declined to comment.

‘Caveat Emptor’

It is not always easy to examine data from testing companies closely, the report says. The industry itself is “shrouded in secrecy,” and testing companies “often classify their information as confidential and release only bits,” the report maintains.

Houghton Mifflin’s Mr. Earnst disagreed. “Everything in this industry of educational testing is very visible,” including test scores and the tests that are given, he said. “A lot of the processes for how testing companies are selected are public record.”

Many school districts and state education departments select testing companies based on their past performance, said Lee Baldwin, the president of the National Association of Test Directors.

“What you depend on with a contractor is that they are going to give you scores that are accurate,” he said. “If they don’t, it raises serious questions with their credibility.”

Mistakes come with the testing territory, he added. “No system is perfect; anything we do is subject to error,” said Mr. Baldwin, who is also the senior director of student assessment and program evaluation for the 160,000-student Orange County schools in Florida.

Still, districts and state education departments should minimize errors by implementing “careful procedures” on how tests are scored and how the outcomes are recorded and measured, he said.

Passage of the No Child Left Behind law, which requires annual testing in grades 3-8 and once during the high school years, makes such precautions more imperative, Mr. Baldwin said. The testing industry is rapidly expanding, he pointed out, and because there are no industry regulations, “almost anyone can claim to be a test publisher and test scorer.”

Consequently, Mr. Baldwin says: “Caveat emptor,” or let the buyer beware.

Michelle Galley