Standards & Accountability

Stanford Report Questions Accuracy of Tests

By Debra Viadero — October 06, 1999 3 min read
  • Save to favorites
  • Print

How often will a student who really belongs at the 50th percentile according to national test norms actually score within 5 percentile points of that ranking on a test?

The answer, a Stanford University statistician says in a new report, is only about 30 percent of the time in mathematics and 42 percent in reading.

For More Information

“How Accurate Are the Star National Percentile Rank Scores for Individual Students?--An Interpretative Guide” is available online at www.cse.ucla.edu/
CRESST/Reports/drrg uide .html
.

David R. Rogosa says his calculations shed new light on traditional, technical methods of describing test accuracy.

“Here’s a different look at what we’re getting from standardized tests expressed in what I hope are common-sense forms,” said Mr. Rogosa, an associate professor in Stanford’s school of education. “And the question that I’m putting out is ‘Are these numbers good enough?’”

His findings add to a growing debate about the use of tests for important decisions such as student promotion or teacher pay, illustrating the point that even the best tests are not perfect.

Last month, CTB/McGraw-Hill, one of the nation’s largest test publishers, apologized for a calibration error that skewed percentile rankings for students taking the popular TerraNova test in at least six states. The most serious consequences occurred in New York City, where officials used the rankings to determine which students should attend summer school, and which were held back a grade. (“Error Affects Test Results in Six States,” Sept. 29, 1999.)

Tracking Test ‘Uncertainty’

David R. Rogosa

“The lessons that this study brings are very important now in days of attention to accountability systems that have high stakes attached,” said Robert J. Mislevy, a distinguished research scientist at the Educational Testing Service in Princeton, N.J. “There’s a certain amount of uncertainty inherent even in a well-controlled system.”

He added that, “even if there had been no equating problems in New York City, some of those kids who had to stay for summer school--if they had taken the test on another day--might have passed it.”

Traditionally, experts describe test accuracy in terms of reliability coefficients, which are fractions between O and 1, with 1 being perfect accuracy.

For his study, Mr. Rogosa focused on the reading and math portions of the Stanford Achievement Test-9th Edition, which is published by Harcourt Educational Measurement of San Antonio. The reading test has a reliability coefficient of between .94 and .96 for grades 2 through 11.

That indicates a very high probability that the score on the test reflects a student’s actual standing. But it is not so high that every student’s achievement level will be identified correctly, especially on a test given to millions of students.

The math test’s reliability coefficient is .94 or .95 for grades 2 through 8. It drops as low as .87, however, for higher grades.

“When people see a number like .95, they say that’s got to be awfully good,” Mr. Rogosa observed. “We’re better off knowing exactly what we’re getting for our money.”

Longer Tests Needed?

Mr. Rogosa said his findings might also apply to other commonly used tests, most of which have similar reliability coefficients. He chose the reading and math portions of the Stanford-9, he said, because they are longer and, thus, more reliable than test sections covering other subjects, such as social studies or science.

A spokesman for Harcourt did not return repeated phone calls last week.

Mr. Rogosa calculated the standard errors for the tests and then translated the numbers to scenarios intended to make them easier to understand.

What are the chances, he notes in one such example, that two students with identical “real achievement"--a hypothetical gold standard for the tests--will score more than 10 percentile points apart on the same test? For two 9th graders who are really at the 45th percentile, the answer is 57 percent. In 4th grade reading, the probability is 42 percent.

Such a wide range in scores does not, however, argue for discarding such tests, Mr. Rogosa writes in his report. Longer tests might enhance accuracy, “especially if readers interpret these results to indicate that current tests do not have adequate accuracy.”

Events

Student Well-Being & Movement K-12 Essentials Forum How Schools Are Teaching Students Life Skills
Join this free virtual event to explore creative ways schools have found to seamlessly integrate teaching life skills into the school day.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Special Education Webinar
Bridging the Math Gap: What’s New in Dyscalculia Identification, Instruction & State Action
Discover the latest dyscalculia research insights, state-level policy trends, and classroom strategies to make math more accessible for all.
Content provided by TouchMath
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
School & District Management Webinar
Too Many Initiatives, Not Enough Alignment: A Change Management Playbook for Leaders
Learn how leadership teams can increase alignment and evaluate every program, practice, and purchase against a clear strategic plan.
Content provided by Otus

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Standards & Accountability How Teachers in This District Pushed to Have Students Spend Less Time Testing
An agreement a teachers' union reached with the district reduces locally required testing while keeping in place state-required exams.
6 min read
Standardized test answer sheet on school desk.
E+
Standards & Accountability Opinion Do We Know How to Measure School Quality?
Current rating systems could be vastly improved by adding dimensions beyond test scores.
Van Schoales
6 min read
Benchmark performance, key performance indicator measurement, KPI analysis. Tiny people measure length of market chart bars with big ruler to check profit progress cartoon vector illustration
iStock/Getty Images
Standards & Accountability States Are Testing How Much Leeway They Can Get From Trump's Ed. Dept.
A provision in the Every Student Succeeds Act allows the secretary of education to waive certain state requirements.
7 min read
President Donald Trump holds up a signed executive order alongside Secretary of Education Linda McMahon in the East Room of the White House in Washington, Thursday, March 20, 2025.
President Donald Trump holds up a signed executive order alongside Secretary of Education Linda McMahon in the East Room of the White House in Washington, Thursday, March 20, 2025.
Ben Curtis/AP
Standards & Accountability State Accountability Systems Aren't Actually Helping Schools Improve
The systems under federal education law should do more to shine a light on racial disparities in students' performance, a new report says.
6 min read
Image of a classroom under a magnifying glass.
Tarras79 and iStock/Getty