NAEP Panel Sets Three Standards For '90 Math Test
Washington--Asserting that it has resolved the issues that led at least three panels of experts to label the process "flawed," the National Assessment Governing Board has voted to set the first national standards for student achievement.
Acting at its quarterly meeting here this month, the 23-member panel of education, business, and public officials adopted a set of scores on the 1990 National Assessment of Educational Progress test in mathematics that represent "basic," "proficient," and "advanced" levels of performance in the 4th, 8th, and 12th grades.
The board this summer is expected to release a report outlining how students performed against the standards.
The report will, for the first time, show how students should perform on the assessment, according to board members. In the past, they said, naep reports have simply described how test-takers performed.
"One thing that is seriously lacking in the national assessment is an external standard," said Francie M. Alexander, associate superintendent of public instruction in the California Department of Education. "If we don't have agreement on what kids should know and be able to do, what does it matter?"
Ms. Alexander acknowledged that the standards the board set do not represent a national consensus on what such knowledge ought to be, but she said the panel could work toward such a consensus in the future.
"We can work it out as we are going," she said. "But we've got to get going."
But Daniel M. Koretz, a member of a technical panel advising the National Center for Education Statistics on naep, said many questions still remain about the validity of the achievement levels. As a result, he said, the public could wrongly interpret them as the performance students need to attain to advance to higher levels of mathematics.
In addition, said Richard M. Jaeger, a member of a panel named by the assessment governing board to evaluate the standards-setting process, the standards may not accomplish their goal of improving education by providing a better gauge of student achievement.
"What you are engaged in here is very much a trial to improve the understanding of naep results," Mr. Jaeger, a professor of education at the University of North Carolina at Greensboro, told the board. "Your hypothesis was that providing achievement levels would achieve that. It's not clear your hypothesis is sound."
The governing board's action this month represents the culmination of one of the most complex and controversial undertakings in the history of naep, a Congressionally mandated project that tests samples of students in reading, writing, math, science, and other subjects.
The standards-setting effort began in late 1989, when the board, a newly constituted body created by the 1988 law that reauthorized the assessment, moved to change the way the data are reported to make them more meaningful to the public and to policymakers.
In contrast to the current system, in which student scores are arrayed on a scale, the board proposed comparing students' performance against agreed-upon standards. Seizing on language from the 1988 reauthorization, the governing board agreed in December 1989 to explore the idea of creating what it then called "achievement goals" for the assessment.
That proposal sparked some criticism from educators, who warned that the goals could create a national curriculum, and from others, who questioned whether the board was the appropriate agency to set such standards.
After modifying its plan to meet some of the objections, the board voted last May to set three levels of achievement on the 1990 math test. (See Education Week, May 23, 1990.)
But while the changes may have allayed some criticism, the way the panel carried out its plan opened up even more.
Employing a commonly used standards-setting procedure--but one seldom used on such a large scale, or on a test that had already been administered--the board brought a group of 63 teachers, math educators, business leaders,4and public officials to Vermont for two days last summer to set the standards.
Despite expressing serious misgivings about the process, the group analyzed each question on the test to judge whether students at the basic, proficient, or advanced level of achievement should be able to answer it correctly. (See Education Week, Sept. 5, 1990.)
In response to complaints that the group did not have time to complete the job, however, the governing board reconvened the panel for two days in Washington. Only about half the original group showed up for that meeting, which was held during a Jewish holiday.
The board then asked a smaller group to compile the results and write descriptions of what student performance at each of the threeachievement levels would look like. But to allow time to receive public comment on the proposed standards, the board then postponed a decision on whether to adopt them.
Reacting to the proposal, several educators and technical experts urged the board not to adopt the standards. They noted that, among other problems, the differences in the judgments of the raters were too large, and that there was no coherence across grade levels.
In a report to the National Center for Education Statistics, issued in January, the technical-review panel argued that "the current achievement levels, obtained before January 1991, are flawed," and recommended "that the achievement levels be used only if corrected."
Two other expert panels--a three-member group hired by the governing board to evaluate the process and a panel commissioned by the National Academy of Education to assess the trial state-level assess8ment--also warned that the standards-setting process was flawed.
Roy E. Truby, executive staff director of the naep board, acknowledged that the process was imperfect.
"I don't have to tell you, a valiant attempt was made, but there were problems," he said at the meeting here this month.
In response to such concerns, the board agreed to conduct a separate procedure to replicate and validate the original process. In a series of four meetings held in California, Connecticut, Florida, and Michigan, the board convened some 211 people--more than 70 percent of whom were classroom teachers--to make judgments on the test items. That group came up with the scores--measured in percentage of test items answered correctly--the board adopted this month. The scores may change slightly after some additional analyses, officials said. (See chart, this page.)
The "replication/validation" study resolved many of the problems raised by the critics of the earlier process, according to Ronald K. Hambleton, a professor at the University of Massachusetts at Amherst who advised the board on the standards-setting procedure.
"Are the levels technically defensible?" he asked. "I believe they are."
But he and others also suggested that several problems were still unresolved. For one thing, Mr. Hambleton noted, the heavy concentration of classroom teachers in the second group may have skewed the results.
"The earlier group, while there were lots of criticisms leveled at their work, at least had a somewhat different composition," he said.
In addition, Mr. Jaeger of the evaluation team said, the second study failed to answer all the objections raised by the critics. For example, he suggested, the variability of recommendations is "unacceptably high," and there is "insufficient evidence to support the validity" of the achievement levels.
"Clearly, procedural issues have been addressed," Mr. Jaeger said. "But procedural issues are not all this is about."
Mr. Jaeger and his panel recommended that the achievement levels be released only on a "single-trial field-test" basis, and that the board defer action on setting subsequent standards until it could evaluate the "efficacy, utility, and impact of the establishment and dissemination of the levels."
"There are serious questions to answer before you proceed to the task of setting achievement levels for every subject area," Mr. Jaeger said.
'Not Rocket Science'
Governing-board members said they were convinced that the second study resolved the problems and that the levels should be released.
But they agreed that their report, which they are expected to consider at their August meeting, should include the earlier results and should state that the effort is a trial.
Responding to Mr. Hambleton's concern about the makeup of the group, Ms. Alexander of California said she thought teachers' judgments were superior to those of a broader group.
"I feel really good about teachers' judgment," she said. "I'm not unhappy teachers were [disproportionately] represented in that part of the process."
Moreover, she said, such a group could be expected to produce a wide range of judgments about student performance.
"The teachers liked not having to come to a consensus," she said. ''They felt their vote counted. We can expect some variability."
Mr. Truby also said the achievement levels would not be used as a "baseline" for tracking progress toward the national goals for student achievement adopted by President Bush and the nation's governors. He said the board would set levels again for the 1992 math assessment.
But he added that he considered the achievement levels valid.
"You can spin the data forever, but, in the final analysis, it's a judgment," Mr. Truby said. "It's not rocket science."
"The question is, is the judgment supportable?" he said. "To a large extent, it is."