Testing's Ups and Downs Predictable
When Washington state education officials released the results from the third year of their testing program, they celebrated. Fourth and 7th graders' scores had increased in just about every subject.
Across the country, however, their counterparts in Maryland had somber news. For the first time in the seven-year history of that state's program, scores dipped, though only slightly.
Both announcements made headlines in their respective states. But the stories could have been written long ago by testing experts. Test scores follow a predictable cycle, researchers have found. They start low, rise quickly for a couple of years, level off for a few more, and then gradually drop over time.
"The pattern is pretty much universal," said Daniel Koretz, a senior social scientist at the RAND Corp., a Santa Monica, Calif.-based research organization, and a professor at Boston College. "There's no doubt about it, it's going to happen."
In addition to revealing the predictability of testing outcomes, those findings raise questions about whether tests are valid measures of progress toward the academic standards set by states.
In an attempt to beat the odds and keep test scores rising past the mark at which they usually start to slide, state officials and standards advocates in Maryland and Washington state say they are putting a number of programs in place.
Maryland's include providing professional development in reading for teachers at every grade level, applying cognitive research to classrooms, imposing middle school reforms, and raising minority achievement.
"What we have to work on is building internal capacity in every school," said Nancy S. Grasmick, Maryland's superintendent of schools.
But test researchers say scores stagnate or fall because schools do the easy things first and then don't address the other needs.
The generous early gains in scores occur because students and teachers become familiar with the tests and their content, according to experts. Teachers start to tailor their instruction to the exams, and as they discover the types of questions that are asked—and sometimes even the precise questions—they drill students on how to perform well on the tests.
In Maryland's case, the state has created an assessment that is worth teaching to, Ms. Grasmick said. It includes questions that require students to perform experiments, write essays, and do other tasks that demonstrate their knowledge. "The tests should drive the quality of instruction in every classroom," she said.
But experts say the cycle of test scores is consistent regardless of the type of test.
The leveling off and eventual decline occur because the schools haven't made costly changes such as reducing class sizes or improving teacher quality, according to Robert L. Linn, a professor of educational measurement at the University of Colorado at Boulder.
"There's only so much you can do," said Mr. Linn, who is the chairman of the National Research Council's Board on Testing and Assessment. "You have to have some systemic changes ... things that cost money and are expensive to fix."
But states often are not ready to spend enough to make such changes, some observers say.
"The tests are pretty comprehensive," said William A. Firestone, a professor of educational policy at the graduate school of education at Rutgers University in New Brunswick, N.J. "After that, it depends an awful lot on individual initiative. It's not as systemic as you would have thought."
At the Start
In the current rush to create new exams, most states are in the early stages of the testing cycle. And most have performed as Mr. Linn, Mr. Koretz, and their colleagues would predict.
When Virginia reported its second-year scores last fall, performance improved over 1998 in every grade and in every subject. California boasted similar results in 1999 when it administered the Stanford Achievement Test-9th Edition for the second consecutive year.
In the three years of Vermont's testing program, the percent of 4th graders who have met or exceeded the state's standards in reading and math has risen every year, but results for 8th and 10th graders are mixed.
Washington state has a third year of data to cite on its 4th grade tests. In mathematics, only 21.4 percent of 4th graders met the standards in 1997, but 37.3 percent did last year. In reading, the rate of those reaching the standard rose from 42.7 percent to 59.1 percent over the three-year period.
Of the four subjects the state assesses, scores have declined only on the writing test. That phenomenon may have more to do with how the tests were scored than the quality of student work, according to William W. Porter, the executive director of the Partnership for Learning, a Seattle-based group of community and business leaders that promotes school improvement. The state is working with its contractor to ensure that questions on the writing exam clearly define the tasks expected of students, which may have contributed to scores dropping.
Of the states in the early stages of testing, only Massachusetts failed to post significant gains in the second year of its new assessment.
The lack of a bounce may have happened because of the "late introduction and rapid pace of change in the [state's] curriculum frameworks," said S. Paul Reville, the executive director of the Pew Forum on Standards-Based Reform and a lecturer at Harvard University's graduate school of education.
"The standards have not had sufficient time to penetrate the field, and the standards are what's driving the change," he added.
Texas students last spring completed their sixth consecutive year of increasing scores in just about every grade level in almost every subject tested in the Texas Assessment of Academic Skills.
Maryland reported similar progress in the first six years of its Maryland School Performance Assessment Program, known as MSPAP.
In the first round of testing in 1993, only 31.7 percent of the Maryland students passed the test. The passing rate rose steadily until 1998, when it reached 44.1 percent. But in December, Ms. Grasmick said, for the first time, students didn't perform better than the previous year.
"Am I concerned?" Ms. Grasmick said in announcing the 1999 scores. "Certainly. We'd all like to see gains every year. But the fact is, we know we are on the right track with six years and a lot of hard work behind us."
Low scores at the start, followed by steady increases, can be attributed to the uniqueness of the Maryland test itself, Mr. Koretz said. Some of the items required students to perform projects such as scientific experiments that teachers found difficult to explain. By the second year, though, teachers were prepared to give students the help they needed.
"The easy things have been done. Now, people are really probing," Ms. Grasmick said.
Even before the last year's MSPAP results came out, the Maryland board of education had adopted an initiative called "Every Child Achieving." The plan requires districts to continually monitor students' progress toward the state's performance standards. For students who don't meet the standards by the 8th grade, districts will offer summer intervention programs and write individualized programs explaining how the students' high schools will help them catch up.
To help teachers improve, the state will require that they all be trained in how to teach reading. In addition, middle and high school teachers will have to have college majors in their content areas, and experienced teachers will be assigned as mentors to rookies.
In a separate initiative, Ms. Grasmick said, researchers from Johns Hopkins University in Baltimore are helping schools—especially low-performing ones—apply the latest findings of cognitive research.
Focus on Improvement
Meanwhile, in Washington state, Mr. Porter said, officials and citizen-activists are preparing similar initiatives, hoping to sustain gains beyond the point where test-researchers suggest they will fade.
But researchers such as Mr. Koretz suggest that will be difficult to do. "The notion that there will be continuous improvement is a little optimistic at best," Mr. Koretz said. "You can teach them more, and you can teach them faster, but at some point, you're going to top out."
And predictably, once the decline hits, policymakers often get impatient, Mr. Linn said, and start searching for a new test. That simply starts the whole cycle over: The first round of testing yields low scores; educators start teaching to the new test; scores start to rise; and eventually they level off, and then ...
Vol. 19, Issue 20, Pages 1, 12-13Published in Print: January 26, 2000, as Testing's Ups and Downs Predictable