Drop in Scores On Reading Test Baffles Experts
The National Assessment of Educational Progress and the U.S. Education Department are investigating an "anomaly" in the results of naep's 1986 reading test that has confounded testing experts and delayed the reporting of the results by at least four months.
The test scores, which showed inexplicably large drops in reading performance among 9- and 17-year-olds since the previous assessment, are apparently the first such abnormal results naep has experienced in its 18-year history.
"This is posing a real mystery here for us," said Archie E. Lapointe, naep's executive director.
Officials of the assessment, which is administered by the Educational Testing Service under contract to the Education Department, last week submitted a report on the matter to naep's technical advisory panel.
"I'm pretty disgusted by this," said Chester E. Finn Jr., the department's assistant secretary for educational research and improvement. "There is a strong possibility that something went awry with the test design or administration."
Mr. Lapointe said that safeguards written into the new contract for the assessment, which require extensive field testing of design changes, make it unlikely that a similar incident will occur in the future.
Nevertheless, the agency will change its procedures if testing experts uncover a problem in naep's techniques, Mr. Lapointe said.
"We've involved every expert in and out of education in this country--and some in Canada--to come and help us solve this problem," he said.
"No one has come up with a flaw," he continued. "It's hard to imagine someone might."
"On the other hand," he said, "there's something wrong here."
The Congressionally mandated assessment, often called "the nation's report card," tests a national sample of about 100,000 9-, 13-, and 17-year-olds in reading, writing, and other subjects.
By the next decade, naep is expected to expand its tests to provide state-by-state comparisons of student-achievement data.
The 1986 assessment, the largest ever conducted, covered reading, mathematics, science, computer competence, U.S. history and literature, and the educational achievement of language-minority students.
Abnormal results were seen only in the reading assessments for the oldest and youngest groups of students tested.
After the 1986 assessment, naep officials had planned to produce a study showing trends in reading performance since 1971.
A similar report, published after its 1984 assessment, showed that students' reading performance had improved between 1971 and 1984, but that it had leveled off in the 1980's.
But the data showing sharp drops in 9- and 17-year-olds' reading performance between 1984 and 1986 were "unbelievable," according to Mr. Lapointe.
"The drops were so significant that, if true, the whole world would have recognized that we have a reading problem," he said.
But the fact that 13-year-olds' reading scores, as well as scores on other tests in the battery, did not appear abnormal indicated that the data were flawed, he said.
Furthermore, he added, officials checked with standardized-test publishers and state assessment officials to determine whether other tests had shown similar declines in reading performance. They had not, he noted.
Naep officials then chose to delay publication of the trend study, originally scheduled to be released last September, rather than release it with questionable results, and to seek possible causes for what they called the "anomaly."
According to Mr. Lapointe, these possibilities included "technical" causes, such as statistical procedures and sampling techniques; "substantive" causes, such as the types of questions asked on the 1986 test; and "esoteric" causes, such as the fact that the assessment was administered the day the space shuttle Challenger exploded, which may have disturbed some test takers.
All of these possibilities, he said, turned out to be "blind alleys."
Naep officials' efforts to find an explanation for the test results have not satisfied the Education Department, which has launched its own study of possible flaws in the assessment.
But until the true cause of the abnormal results can be determined, said Mr. Finn, "there is a definite possibility something went awry in reading, not the test."
The department's review of the incident, led by Edward Haertel, professor of education at Stanford University, is expected to be completed in March.
In the meantime, naep officials say they plan to publish a trend study after the 1988 assessment, which is currently under way.
Using what statisticians call "bridge" techniques, they are administering the 1986 and 1984 tests to a sample of students this year, in order to be able to compare results over time.
In the future, assessment officials said, they will be more careful in changing the design or administration of the tests between assessments. The new contract for the assessment, awarded to the ets last October, contains a clause that requires extensive field testing of any new design.
"Any changes in the future in design must be done experimentally first," Mr. Lapointe said. "Only after they have proven to work can they be implemented."
"This slows down the pace of change," but the delay is reasonable, he added, noting that such safeguards are essential if naep is to expand as envisioned.
"We're trying to learn from this experience," added Dr. Finn.