Yet Another Report Assails NAEP Assessment Methods
The method used to determine how good is good enough on the National Assessment of Educational Progress is fundamentally flawed, a report by a group of prominent education researchers says.
Released last week by the National Academy of Education, the report represents the sharpest criticism to date of some of the newer methods being used to measure or interpret NAEP results. Like other recent reviews, it raises troubling questions about the NAEP tests, which are used as a national barometer of student progress.
"People interpret this as largely a national discussion among academics, and this is not what the point is,'' said Emerson J. Elliott, the commissioner of the National Center for Education Statistics. "The point is how can people best understand what the national-assessment data tell us about how our schools are doing.''
The spate of criticism has not deterred officials of the $29 million testing program. With no better methods in sight, members of the National Assessment Governing Board say they plan to keep NAEP on its course.
"We're not handcuffed'' to those methods, said Mark D. Musick, the board's chairman. "We just don't believe the criticisms are ones that swamp the boat.''
From 'Can' to 'Should'
Congressionally mandated NAEP examinations have been given to 4th, 8th, and 12th graders since 1969. But the current criticism stems from a policy shift undertaken in 1990 in response to calls for higher achievement standards.
To move NAEP beyond simply measuring how students scored on the tests, the governing board directed testing experts to find ways to assess how well students did against a standard for what students should know.
Beginning with the 1992 assessment in mathematics, students' scores were grouped into three achievement levels: basic, proficient, and advanced. And achievement was gauged against those levels.
Three reviews before the one by the National Academy of Education raised doubts about how scores were interpreted under the new procedures. The General Accounting Office, for example, in a report this summer asked whether student test-takers could actually do what the new achievement levels described. (See Education Week, July 14, 1993.)
The education academy's report echoes that criticism. And it goes a step further, taking issue with the method used to set the achievement levels in the first place.
Under that procedure, known as the Angoff method, expert judges are asked to envision what they see as a particular level of performance. Then they review each test item and determine the probability that a person performing at a particular level could answer it correctly.
One problem, the academy researchers say, is that the educators who judged the NAEP standards "could not maintain a consistent view of what a student at the borderline of each level should be able to do.''
"The judges were almost as different from each other as students were on the whole test,'' said Lorrie Shepard, the principal investigator for study. The panel was chaired by Robert Glaser, the director of the Learning Research and Development Center at the University of Pittsburgh, and Robert Linn, a co-director of the National Center for Research on Evaluation, Standards, and Student Testing at the University of Colorado at Boulder.
The researchers conducted their own studies to determine how valid the new achievement levels were. They compared NAEP data, for example, with teachers' ratings of students, with results from tests independently administered by the researchers, and with results from the Advanced Placement tests, the Scholastic Aptitude Test, and statewide assessments in Kentucky.
In all but one case, the report says, more students scored at the basic, proficient, and advanced levels than were identified by the NAEP data.
That "suggests that the 1992 achievement levels were set unreasonably high,'' it says.
The researchers recommend that NAEP drop its standards-setting method and look for a better way to measure achievement against a standard. In the meantime, they say, NAEP results should not be reported by achievement levels.
Analysis Said Flawed
In response, Mr. Musick and other testing experts pointed out that the Angoff method is a commonly used standards-setting procedure.
They said the academy's analysis was flawed, in part, because the researchers ignored some data that supported that method. For example, the proportion of students in Kentucky's assessment that scored at the NAEP achievement levels was comparable to that on the actual NAEP exam.
Moreover, the defenders said, setting standards is a matter of policy and informed judgment and not, in Mr. Musick's words, "a scientific search for 'truth.'''
Testing experts said the new report, combined with the earlier ones, adds fuel to what will be a running debate.
"I don't think any one of these reports is going settle anything,'' said Ramsay Selden, the director of the Council of Chief State School Officers' state assessment center. "We're going to have to keep looking at it.''