One of the common beliefs about testing in the era of accountability hinges on the notion that student test scores improve rapidly in the first few years of a new testing program as teachers and students adjust, followed by a “plateau” in scores after the early gains.
The assertion underpins criticism of the federal No Child Left Behind Act, which requires states to raise scores regularly over a 12-year period.
But a new study, released yesterday, offers evidence that while this “plateau effect” in test scores does appear in some states, it is not pervasive across the nation.
“There’s as likely to be an increase or a decrease in scores as a plateau,” said Jack Jennings, the president of the Washington-based Center on Education Policy, a nonpartisan research group. “The idea that you always hit a plateau just isn’t true.”
The report is a companion to one released earlier this month that found state test scores appear to be improving across all proficiency levels in the wake of the federal NCLB law, which was enacted in early 2002. (“NCLB Found to Raise Scores Across Spectrum,” June 17, 2009.)
No Common Patterns
The new study examines 55 state test-score trends across 16 states. Each of the trend lines represents at least six years of test-score data between 1999 and 2008. None of the states studied changed those tests over that time period or lowered their “cut scores,” the number or percentage of questions students must answer correctly to be deemed “proficient.”
The concept of the plateau effect holds that the largest gains in test scores will appear in the earliest years of a testing program, as teachers drill students on the new item formats, and those students on the cusp of proficiency make gains. After districts have culled the “low-hanging fruit,” the thinking goes, it becomes more difficult to bump up the scores of students with learning challenges, and overall scores level off.
But the report found no widespread pattern of plateaus. Of the 55 trend lines studied, 15 exhibited a plateau. Twenty-one trend lines showed steady increases in the percentage of students scoring at the proficient level on the tests, while 19 states showed a zigzag pattern that, despite some downturns, indicated upward momentum overall.
The report complicates the research literature on plateaus: A number of earlier studies did find evidence of the phenomenon. Mr. Jennings surmised that some of those studies were conducted on state data from the 1980s and 1990s. In those years, before state and federal accountability regimes put a premium on using fresh test items each year, states commonly recycled questions, making it easier to prep students.
The report also found that for a third of the trend lines studied, the greatest score gains were made in the 2003-04 period, during which testing under the NCLB law was fully established, suggesting that the higher stakes accompanying the federal law did cause districts and teachers to redouble efforts to raise scores.
“It’s pretty hard to look at that and not think that NCLB had an effect,” Mr. Jennings said.
Bruce Fuller, a professor of education and public policy at the University of California, Berkeley, generally agreed with the report’s assertion that in the states studied, accountability systems appear to be having a sustained effect at raising student test scores.
But he pointed out that such gains generally haven’t been reflected on other measures, such as the National Assessment of Educational Progress.
The CEP analysis does not separate scores by grade levels, so it’s unclear whether the accountability systems are raising basic numeracy and literacy skills or more complex skills at higher grades, he added.
“If they’re just squishing grade levels together, we can’t get at that question,” said Mr. Fuller, who has studied test-score plateaus in California.
The higher scores may not reflect greater student learning, he said.
“Perhaps trend lines climb mainly in states that rarely change their testing regimes. The exams simply become more familiar to teachers over time. The dilemma is that when states change tests, the results cannot be reliably tracked,” Mr. Fuller said.
The report does not attempt to limn why the test-score patterns appear to vary from state to state, and Mr. Jennings sounded a note of caution on that subject.
“It should make us all a little more cautious about believing all test results are sacrosanct,” he said. “You do get different patterns, and it could be because of different types of tests, an influx of immigrant kids into an area, or how teachers are teaching.”
A version of this article appeared in the August 12, 2009 edition of Education Week