Efforts in more than half the states to tie major consequences to students’ test scores are translating into academic gains, according to the latest in a series of studies on the policy approach known as high-stakes testing.
The report, “Reconsidering the Impact of High-Stakes Testing,” is available from the Education Policy Analysis Archives.
Or then again, maybe they’re not.
The study, published this month in the online journal Education Policy Analysis Archives, draws on eight years of national testing data to compare states with traditional “low stakes” testing policies against those with “high stakes” systems. Under high-stakes policies, students’ scores are used to decide which teachers or schools win cash bonuses, whether students graduate or move on to the next grade, or what schools are subject to takeover by their districts or states.
The report follows half a dozen other studies over the past year that have used similar techniques to evaluate the effects of such accountability systems. (“Study Finds Higher Gains in States With High-Stakes Tests,” April 16, 2003).
“I was intrigued by the fact that different researchers with different ideological stances were coming to different conclusions from the same data,” said Henry I. Braun, the author of the new study. “I was also motivated by a sense that the world of research is very complex and we are not, in our research worlds, respectful enough of that complexity.”
Sorting It Out
Mr. Braun, a statistician with the Princeton, N.J.-based Educational Testing Service, used four different methods to compare changes in states’ scores on National Assessment of Educational Progress mathematics tests between 1992 and 2000.
The comparisons pitted the 18 states that some previous researchers have identified as having high-stakes systems against 32 with lower-pressure accountability systems.
Looking first at overall changes in the states’ 4th and 8th grade test scores over that period, Mr. Braun, like most of his predecessors, found that students’ academic gains were greater in states, such as Texas and North Carolina, that had high- pressure testing systems.
What’s more, he said, the trend could not be explained by statistical errors or the fact that some of the states showing the biggest improvements had also been excluding growing percentages of special education students from the tests.
In the 4th grade, the difference in mean scores between the high-stakes and low-stakes states was 4.3 score points; in 8th grade, it was 3.99 score points.
The opposite occurred, though, when Mr. Braun took a look at how cohorts of students fared on the tests over time. (He compared 4th graders’ scores with the 8th grade scores in the same states four years later.)
That time around, the improvements in academic achievement were greater—albeit to a lesser degree—in the states with low-pressure testing systems.
Mr. Braun said the differing results didn’t surprise him.
“You cannot look at high-stakes testing in isolation from other things going on in the state,” he said. “Many education reforms can be assisted or thwarted by other education reforms going on at the same time.”
In an effort to take a broader look, Mr. Braun reconfigured the data to factor in a measure that rated states on their education activism. It assigned states grades based on whether they had enacted—or were about to enact—22 school improvement efforts, such as professional standards for teachers or subject-matter standards.
But he found little correlation between the level of states’ education activism and their students’ test-score changes over time.
Mr. Braun also looked at changes in scores for the bottom 25 percent of students in each of the states. Gains were greater in the states that put more pressure on students or schools for test-score improvements.
“All of this is about which states you include and which states you do not include,” said David C. Berliner, an education professor at Arizona State University in Tempe whose own study on high-stakes testing helped spark the spate of research on the subject. (“Reports Find Fault With High-Stakes Testing,” Jan. 8, 2003).
For his study, co-written with Audrey L. Amrein, Mr. Berliner compared states’ academic gains against the national average. He and Ms. Amrein found that most of the high-pressure states saw decreases in 4th grade math scores after adopting their testing programs. At the 8th grade level, a majority of high-stakes states gained relative to the national average.
Referring to Mr. Braun, Mr. Berliner added: “He was able to find results, but my guess is that our study and everybody else’s is still going to be subject to criticism.”