NAEP's 'Anomaly' Blamed on Changes In Test's Design
BOULDER, COLO.--The "anomalous'' results of a 1986 reading assessment that showed steep drops in performance among some students resulted from changes in test design and administration, and did not reflect an actual decline in achievement, a panel of experts has concluded.
Over the objections of one panelist--Jeanne S. Chall, professor of education at Harvard University's graduate school of education--who argued that the declines were real, the majority of the members concluded that the technical changes may have adversely affected the performance of poorer students, driving down average scores.
In addition, they suggested, the results of the 1984 reading assessment may have been unusually high.
To prevent such problems from recurring, the panel recommended an ongoing statistical evaluation and audit of the data collection and reported findings of the National Assessment of Educational Progress, which conducts the assessments.
"Nobody can guarantee something like this won't happen again,'' the panel's chairman, Edward H. Haertel, associate professor of education at Stanford University, said here last week. "But there should be procedures in place so that if a drop like this did occur, we would have enough confidence in the results we can believe even a surprising change.''
"If the confidence in the assessment is such that when something surprising shows up, people look for other explanations, it's not doing us much good,'' he added.
The panel this week is expected to submit its report to NAEP's policy arm, which will consider implementing its recommendations.
But officials from the U.S. Education Department, which oversees the assessment, said last week that they had adopted its key proposal.
In releasing the department's blueprint for an expansion of NAEP in 1990, Gary Phillips, chief of cross-sectional and special studies at the department's national center for education statistics, said the agency planned to fund a series of studies to validate the test's administration.
"We don't want to have another reading anomaly,'' Mr. Phillips said. "The validity studies will help us take corrective action.''
"We can fix things as they go wrong,'' he added, "rather than after the shuttle blows up.''
Probing the 'Anomaly'
The 15-member panel was appointed by the Education Department last December to investigate the anomaly in the 1986 reading scores, which showed unusually steep drops in the performance of 9- and 17-year-olds since the previous assessment.
Those abnormal results had also prompted a review by the Educational Testing Service, which administers the assessment, and caused the firm to delay the reporting of the results by more than five months. (See Education Week, Jan. 20, 1988.)
In addition, the department's panel was also charged with making recommendations for the expansion of NAEP to allow state-by-state comparisons of student-achievement data. The assessment's reauthorization, signed into law in April, sets a trial interstate assessment of 8th-grade mathematics in 1990, and of 4th- and 8th-grade math and 4th-grade reading in 1992.
The study group presented its findings here at the annual assessment conference sponsored by the Education Commission of the States and the Colorado Department of Education.
According to Mr. Haertel, the study group generally concurred with the E.T.S. review of the 1986 anomaly, which concluded that the drop in scores did not represent a drop in achievement.
But that analysis, which focused on the decline in average scores, failed, he said, to examine the variation among scores. In 1986, the deviation between low- and high-performing students was much wider than in 1984, he said.
Administration or Curriculum?
The department's panel found that these changes in performance were most likely caused by changes in the test's design and administration.
The date of the 1986 assessment was 22 days earlier than that for the 1984 test, it found, and required pupils to answer more background questions. These factors may have resulted in poorer performance among some pupils, it suggested.
It also concluded that the 1984 results may have been unusually high. School reforms enacted in 1983, such as stiffer graduation requirements and minimum-competency tests, may have caused a temporary jump in the dropout rate, which would have raised the average score of the 17-year-olds, the panel found.
Ms. Chall, the director of Harvard's reading laboratory, dissented from that conclusion. The declines in performance, she argued in a separate paper, are real, and reflect changes in curriculum and instruction.
For example, she noted, the shift away from phonics instruction in the early grades may have been detrimental to the reading performance of low-achieving 9-year-olds.
But the majority of the panel argued that changes in curriculum alone do not explain the declines.
"There might be reasons to expect declines in reading performance,'' said Mr. Haertel. "I don't think it can explain that much. Changes in curriculum just don't happen that fast.''
The department's outline for the expanded assessment, according to Mr. Phillips, was based in part on the panel's report.
The outline calls for two assessments: a national assessment, based on a national sample of 90,000 students, conducted by a single national contractor; and state assessments, conducted by each participating state, which would be based on a sample of at least 2,000 students in each.
The national assessment would include a much larger sample of private-school students than have past tests, he noted.
NAEP will collect and provide state-level data, but will not rank the states. "It is highly likely that the department, through a separate contract, will conduct the rankings,'' he said.
Mr. Haertel's panel urged the assessment to broaden the content coverage and ask students more background questions, in order to gather more data on what students know and can do.
The panel also recommended that the assessment gather enough data to enable analysts to estimate the achievement level of each test taker. Such an analysis would allow researchers to gain a better understanding of variations among students and schools, panel members said.
But the Education Department, citing cost constraints, rejected that suggestion. In its blueprint, the department proposed limiting the time of the assessment to one hour.
"Some things they wanted to do we just can't do,'' said Mr. Phillips.
Mr. Phillips added that the department is soliciting comments on its blueprint, which will be published in the Federal Register later this summer.
States must decide by Sept. 1 whether they will participate in the 1989 field test of the expanded assessment, and by Dec. 1 whether they will participate in the 1990 assessment.