The federal testing program that states rely on to validate the success of their school improvement policies is in danger of losing the ability to compare current scores with those dating back to the early 1990s.
Trends in state results from the National Assessment of Educational Progress could be “significantly threatened” by the rising number of students who require accommodations for their disabilities, federal officials told the independent board that governs the assessment last month.
Because the NAEP program, known as “the nation’s report card,” historically has not reported the scores of students who receive such special assistance, states with high proportions of children needing the accommodations may be reaping artificial increases.The National Center for Education Statistics, a branch of the federal Department of Education, is evaluating the impact of these new demographics. That work will delay the release of states’ scores on the mathematics and science tests taken in 2000 until next summer. They were to be released next spring.
When the scores are published, the National Assessment Governing Board has decided, the report will include the results from two separate samples: one in which students are excluded from the test if they need special assistance, and one in which accommodations are offered to those who require them.
“There’s the possibility that if exclusion rates vary a lot ... that it could threaten the trend,” said Gary W. Phillips, the acting commissioner of education statistics. “There will be mostly minor differences [between the 2000 results and 1996 results], but occasionally big differences. In some states, the changes are bigger than others,” he said.
NAEP’s state tests are often cited as evidence that a state’s school reforms are working.
Gov. George W. Bush of Texas, the Republican nominee for president, points to the Lone Star State’s NAEP scores, for example, as validation of the assessment-based reforms that have been a hallmark of his state’s improvement efforts. Many other states depend on the state NAEP scores to confirm gains made on their own assessment systems.
But critics of NAEP contend that the difference in test-takers means that states can no longer point to the national assessment for proof of their success.
“We’ve had all these reforms going on around the country, ... and everybody was looking to the NAEP to determine what was working,” said Richard G. Innes, a citizen activist from Villa Hills, Ky., who first pointed out that Kentucky’s reading gains between 1994 and 1998 coincided with a large jump in the number of students excluded from the test.
“We’ve lost an absolutely critical piece of information at an absolutely critical time,” asserted Mr. Innes, who is a critic of Kentucky’s wide-ranging, 10-year-old reform initiatives.
But researchers who rely on NAEP data doubt whether the rise in exclusion rates for the sample will change the overall score enough to matter.
“In order to move a whole state average with only 2 to 3 percent of kids, their scores have to be so low to have an appreciable effect,” said David W. Grissmer, an analyst for the RAND Corp., a Santa Monica, Calif.-based think tank.
Besides, he added, factors such as demographics and school- participation rates have always changed from year to year, making such comparisons specious without statistical adjustments to account for the shifts.
Mr. Grissmer has published several reports using state NAEP scores as the benchmark of achievement in evaluating states’ school improvement measures. (“RAND Report Tracks State NAEP Gains,” Aug. 2, 2000.)
Of Samples and Asterisks
The congressionally mandated NAEP program, which tests samples of students in key academic subjects, is divided into several pieces.
One portion monitors national achievement in science since 1969, reading since 1971, and mathematics since 1973. The most recent report—based on 1999 tests— documents slight increases over the past 30 years in each of those subjects and at all three age groups tested. Mr. Phillips said participation rates haven’t changed dramatically for that assessment.
Another part of NAEP began in 1990. It gauges individual states’ achievement in reading, writing, math, and science, and also reports national results. Each subject is tested every four years, with states volunteering to provide a sample. Subjects such as history and civics are assessed less frequently and don’t deliver state-by-state results.
This past spring, NAEP gave the tests for mathematics and science on the state-by-state level. It also administered a reading test that will deliver national results.
NAEP exams are given to a sample of students who represent a state’s demography, and the tests are used to perform intricate statistical procedures to report a state’s score.
Because of the increasing number of students requiring testing accommodations for physical, learning, or other disabilities, NAEP has kept two separate samples since 1996. The first gives no special assistance; any child needing help is removed from the testing program. The scores from that group have been publicly reported since the program began.
The second sample offers such accommodations as extra time or an aide to read the exam to the test-taker.
The problem with the first sample is that the number of students excluded began to skyrocket in 1998, the year after federal special education law started requiring schools to outline exactly what special assistance disabled students need when they are given tests.
Of the 38 states that participated in NAEP’s 8th grade math test in both 1996 and 2000, 27 states removed higher proportions of the original sample this year because of the requests for accommodations. One unidentified state withheld up to 9 percent more of the sample than it did four years ago.
Researchers at the Princeton, N.J.-based Educational Testing Service, which designs the test, are double-checking NAEP scores to see if the increase in exclusions led to undue gains in scores in the sample that is usually reported.
Any state that receives an extra boost because of changes in the sample will be noted with an asterisk when the scores are published, Mr. Phillips said.
‘A Balancing Act’
The National Assessment Governing Board, which sets NAEP policy, decided to publish both scores from the 2000 assessments so the public can see the differences in scores between the two sets of students, one member said.
“We’re in a period of transition,” said John H. Stevens, the executive director of the Texas Business and Education Coalition in Austin. “We need to be as honest as we can with the public and show the results of both groups.”
In the future, the board will have to decide how—or even whether—to preserve the long-term trend while still meeting NAEP’s goal of measuring achievement in schools as they exist today.
“One can’t be a slave to the long-term trend,” said Mr. Stevens, who is on the board’s reporting and dissemination committee. “Occasionally, you’re going to have bumps in the road on this long-term trend.”
In the long run, NAEP could switch to reporting the scores of the sample that receives accommodations, RAND’s Mr. Grissmer said.
“It’s good to publish both now,” he said. But if the number of disabled students excluded from the original sample continues to rise, he said, “then you start a new trend so you can track the future tests.”
Others say such a transition would be difficult for the policymakers who decide how NAEP is run and how its scores are reported.
“It’s a difficult balancing act,” said James W. Pellegrino, a professor of cognitive studies at Vanderbilt University’s Peabody College in Nashville, Tenn., and the chairman of a panel that evaluated NAEP for the National Research Council in 1998. “Heretofore, they haven’t had to worry about this.”