Assessment Opinion

Does High-Stakes Testing Hurt Students?

By Laurence Steinberg — February 05, 2003 6 min read
  • Save to favorites
  • Print
Does high stakes testing hurt students? Read the early evidence with caution.

According to a recent front-page story in The New York Times, high-stakes testing does more harm than good, increasing the proportion of students who drop out of school, decreasing the proportion who graduate, and diminishing students’ performance on standardized tests of achievement. But before President Bush and his education team abandon their efforts to hold students, teachers, and schools accountable, they should read the actual report on which the news story was based.

The study contained in the report, conducted by Arizona State University researchers Audrey Amrein and David Berliner and paid for by several affiliates of the National Education Association, analyzed over time trends in student achievement and school completion in states that have implemented high- stakes testing and compared these trends to national averages for the same indicators. If student performance following the introduction of the test appeared to decline relative to the national trend over the same time period, the authors concluded that the testing had a negative effect. (“Reports Find Fault With High-Stakes Testing,” Jan. 8, 2003.)

Based on their analyses, the authors compiled a score card that tallied the number of states in which testing had a negative effect, the number of states in which the effect was positive, and the number in which the impact of testing was mixed or unclear. Because the number of states in the negative column exceeded the number in the positive column, they concluded that, on average, high-stakes testing is bad.

The so-called declines the authors used to categorize states into the winning or losing columns are often so small as to be meaningless, however. Consider, for example, the “strong” evidence that the implementation of high-stakes testing in New York had adversely affected school completion. From the sound of it, one would think that thousands of students had dropped out as a result of the testing. In fact, during the period after the 1994 introduction of graduation exams in New York, the state’s dropout rate didn’t increase at all—it remained flat, whereas during the same period, the national dropout rate declined by 1 percent. And what was the staggering drop in New York’s graduation rate following the introduction of the test? The rate declined by three-tenths of 1 percent during a time when graduation rates remained unchanged nationally. Nevertheless, on the basis of this “strong” evidence, New York ends up in the column of states whose students were ostensibly harmed by testing.

By the time one gets to the authors’ summary table, though, much less the hyperbolic press release that trumpeted the report, the actual sizes of the effects that are under discussion are long forgotten. In other words, the list of states where students were allegedly harmed by testing could include states whose indicators barely changed as well as those where they changed a great deal. In fact, there were many of the former and few of the latter. Indeed, of all the states whose graduation rates declined following the implementation of testing, none saw a decline that differed from the national average by more than 1.6 percent. Moreover, the average relative decline in graduation rates among states whose rates fell was smaller than the average relative increase in graduation rates among states whose rates rose. The data showing changes in achievement-test scores are equally meaningless, with the putative effects of testing usually smaller than the margins of error in the tests.

When a trend being analyzed is brief, it is easy to be fooled into thinking it is meaningful.

Social scientists generally are interested not only in the size of an effect, but in whether the result is statistically significant. In fact, nowhere do the authors of this report say whether the effects they have alleged to uncover are statistically significant, most likely because they are not. (I corresponded with Ms. Amrein and learned that no significance-testing had been done.) This is important, because findings that look impressive are frequently chance occurrences. When a trend being analyzed is brief, it is easy to be fooled into thinking it is meaningful. Suppose, for example, a coin I flipped four times in a row landed on heads each time. Would you be willing to believe that I had discovered a magic coin that always turned up heads, or would you want to see a few more flips? In the analyses presented in this report, not only are the effects often minuscule, few of the trends the authors describe are long enough to draw any reliable conclusions about the impact of testing on anything.

It is conceivable, of course, that implementing high-stakes testing could influence dropout or graduation rates, although the authors of this report, as well as those who funded it, will have a hard time explaining why, in several states, the trend lines point to declining dropout rates and rising graduation rates after the introduction of testing. (I don’t place much credence in these results, either, because they, too, are unlikely to be statistically significant.) But the authors’ contention that the implementation of high-stakes testing depressed students’ performance on tests like the SAT or ACT is just plain silly. Performance on these tests is strongly linked to students’ socioeconomic status and is marginally, if at all, affected by what takes place in the classroom.

And then, of course, there is what social scientists call the third-variable problem. During the period following the implementation of testing, plenty of other factors change as well, and many of these factors could conceivably influence dropout and graduation rates as well as achievement-test scores. Comparing each state’s trend to the national trend does not solve this problem, because factors that may have changed in a particular state may not have changed in the same way across the nation.

It is conceivable, of course, that implementing high-stakes testing could influence dropout or graduation rates.

One potentially important factor, for example, is the size of the state’s Hispanic population, because Hispanic youngsters drop out of school at a much higher rate than do other students. The two states where the relative increase in the dropout rate following the introduction of testing appears to be large enough to be worrisome—Nevada and New Mexico—are states with high and rapidly growing Latino populations. In fact, five of the eight states that showed a relative increase in their dropout rates following the introduction of testing are states with large Latino populations that grew dramatically during the time frame examined in the report (the other three are New York, Texas, and Florida). In all likelihood, this change in demographics, and not the implementation of testing, led to higher rates of dropping out and lower test scores.

A sensible reading of the evidence to date suggests that high-stakes testing so far has had neither the dramatic beneficial effects hoped for by its proponents nor the catastrophic ones feared by its detractors. But even this conclusion is not cautious enough. It will take many years, perhaps even decades, to assess the impact of such a dramatic change in educational policy and practice on student achievement.

Does high-stakes testing encourage teaching to the test? Probably. But this is not a problem if the tests that teachers are teaching to are measuring things we want our students to learn. As long as this is the case, there is nothing wrong with ensuring that students have mastered what we expect them to know before promoting them to the next grade level. How can anyone oppose that?

Laurence Steinberg is the distinguished university professor of psychology at Temple University in Philadelphia.

Related Tags:


School & District Management K-12 Essentials Forum Get a Strong Start to the New School Year
Get insights and actions from Education Week journalists and expert guests on how to start the new school year on strong footing.
Reading & Literacy Webinar A Roadmap to Multisensory Early Literacy Instruction: Accelerate Growth for All Students 
How can you develop key literacy skills with a diverse range of learners? Explore best practices and tips to meet the needs of all students. 
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
College & Workforce Readiness Webinar
Supporting 21st Century Skills with a Whole-Child Focus
What skills do students need to succeed in the 21st century? Explore the latest strategies to best prepare students for college, career, and life.
Content provided by Panorama Education

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Assessment Letter to the Editor We Need NAEP
The president and CEO of Knowledge Alliance responds to a recent opinion essay's criticism of the National Assessment of Educational Progress.
1 min read
Illustration of an open laptop receiving an email.
Assessment Letter to the Editor 2022 Assessment ‘Most Important’ Ever
The executive director of the National Assessment Governing Board responds to criticism of NAEP in this letter to the editor.
1 min read
Illustration of an open laptop receiving an email.
Assessment Opinion Ignore NAEP. Better Yet, Abolish It
We’ve got to stop testing schools to death, writes Al Kingsley. National (and international) tests won't “fix” education.
Al Kingsley
5 min read
conceptual illustration of a ruler measuring a figure
Vanessa Solis/Education Week and iStock/Getty images
Assessment Opinion The Future, Present, and Past of 'the Nation's Report Card'
What lies ahead for the nation's only true barometer of the state of K-12 education?
7 min read
Image shows a multi-tailed arrow hitting the bullseye of a target.
DigitalVision Vectors/Getty