NAEP Will Make Its Most Extensive Use Of Performance Items
Washington--In a major effort to enhance the quality of its data on student achievement, the National Assessment of Educational Progress next month will undertake its most extensive use of performance-based testing.
The 1990 assessment will include, in addition to traditional multiple-choice questions, mathematics questions assessing students' use of scientific calculators, open-ended math and reading questions, and science questions that require students to draw responses or write essays.
In addition, it will for the first time measure students' writing abilities based on a portfolio of materials students have already produced.
Although several states have included performance-based testing in their assessment programs, few have involved as many subjects or tested as many students. NAEP will test a national sample of about 100,000 students in grades 4, 8, and 12.
Moreover, the results of the national assessment will have substantial nationwide significance, since the 8th-grade math test is the first to provide state-by-state comparisons of student achievement. And, under a plan being considered by NAEP's governing board, it could be the first test to measure performance against national standards.
Chester E. Finn Jr., chairman of the governing board, said the assessment will provide substantially more information about student abilities than past NAEP tests, which asked primarily multiple-choice questions.
"I am not hostile to multiple-choice assessments," Mr. Finn said. ''I do think there are a fair number of things you can find out through multiple-choice tests that are worth finding out."
"But where multiple-choice tests begin to be very unrevealing," he added, "is in getting at things like reasoning and analytic abilities, and the capacity to blend skills and knowledge to solve problems."
"Most of the real-world, higher-education tasks we are seeking to monitor kids' abilities in," Mr. Finn said, "are the very tasks that don't lend themselves to a multiple-choice response."
Mr. Finn acknowledged that the performance-based components of the 1990 NAEP, which make up about 30 percent of the total battery, represent a limited use of such techniques. But he noted that the federally funded project was constrained by cost--reading essay questions costs up to 300 times as much as having a computer scan a multiple-choice test--and by the need to maintain continuity with past assessments.
But critics of traditional tests, while hailing the NAEP innovations, cautioned that they must go further if they are to present an accurate picture of student achievement.
"If we have national goals of sufficient breadth, we need broader performance assessment to assess progress toward them," said Ruth Mitchell, associate director of the Council for Basic Education. "If NAEP is five-sevenths multiple-choice, of course there will be multiple-choice teaching."
'A General Thrust'
The national assessment, often called "the nation's report card," has served since 1969 as the principal vehicle for gauging national trends in student performance in, among other subjects, reading, writing, mathematics, and science.
Currently operated by the Educational Testing Service under contract to the U.S. Education Department, NAEP has also earned the respect of the testing community for innovations in testing technology. Last year, for example, four NAEP staff members won the National Council for Measurement in Education's triennial award for technical contribution to educational measurement.
The development of performance items, according to Ina V.S. Mullis, NAEP's deputy director, reflects the growing interest in the testing field in such forms of measurement.
"Generally, people involved in measurement and assessment are interested in more innovative measures of more complex skills," Ms. Mullis said. "It's a general thrust in assessment across the board."
Several states, including Connecticut and California, have launched efforts to produce large-scale performance-based tests as part of their statewide assessment programs. (See Education Week, Sept. 13, 1989.)
Cost and Training
But NAEP's effort is one of the most ambitious attempts to introduce such assessments, experts point out, because of its scale.
Unlike traditional multiple-choice tests, performance-based tests cost considerably more to administer, and require large investments in training of administrators and scorers, noted Jules Goodison, NAEP's associate director.
Scoring a multiple-choice item, he said, takes about 10 minutes and costs between 5 and 10 cents; by contrast, reading an essay takes several days and costs between $1 and $3.
For the 1990 math assessment, Mr. Goodison added, NAEP also purchased 52,000 calculators to distribute to students for use during the examination. The assessors also trained students in their use.
Officials from ets also trained about 130 people to score the open-ended questions, Mr. Goodison noted. Over the past few weeks, scorers met in Iowa City, Iowa, to read sample responses and evaluate them against guidelines developed by the firm.
Even with such training, Mr. Finn said, the scoring could pose problems.
"In some areas," he said, "it is very difficult to get people to agree on what is 'adequate' performance."
An additional complication in the development of the 1990 test, Mr. Finn added, is the fact that it will include state-by-state results. No state wanted to be embarrassed by its students' performance on items that may not reflect its curriculum, he said.
But Mr. Goodison said state officials enthusiastically endorsed the inclusion of open-ended questions, even though they expected students to perform relatively poorly.
"The more difficult-to-measure skills are often the most difficult to perform," Ms. Mullis added. "Unless we measure them, we won't know whether students will perform poorly or not. We have not measured them that well."
To assess such skills, the 1990 assessment will measure student performance in a variety of ways.
Perhaps the most ambitious innovation is the writing portfolio. Although Vermont is now developing a statewide writing assessment that will include the collection of student portfolios, no other large-scale testing program has attempted such a step.
Under the project, which is a test for large-scale use in the 1992 writing assessment, NAEP officials will collect samples from some 3,000 students across the country.
Ms. Mullis said the portfolio would address what some critics have maintained is a serious drawback in NAEP's writing assessments--the short time period available to students.
"Hopefully, there will be a lot of examples of excellent writing," she said. "We'll see what students are capable of producing outside the assessment constraints."
In addition, she said, the portfolio will also provide a snapshot of the writing instruction that goes on in a sample of U.S. classrooms.
"It may turn out that the descriptive analysis is just as interesting" as the test results, Ms. Mullis said. "This will give us a picture of what assignments students are given and what they produce in response to those assignments."
Use of Calculators
The rest of the tests will also include performance measures.
The mathematics test, for example, will ask three types of questions to gather data on student use of calculators.
In one type, such devices would be useless in solving the problem; in another, either a calculator or paper-and-pencil computation could be used; and in a third, students would be virtually unable to answer the questions without the tools.
Such questions go beyond traditional tests by assessing students' abilities to determine whether a calculator is appropriate for solving problems, according to Ms. Mullis.
"Can they engage in complex problem-solving using calculators as a tool to help them in their endeavors?" she asked.
In addition to the calculator problems, the math test also includes a component requiring students to estimate their answers, a skill math educators consider vital.
In such questions, assessors will use audiotapes to pace students through questions, providing too little time for calculation.
The math test will also include a variety of open-ended questions that are aimed at measuring students' understanding of mathematical concepts. Such questions may, for example, ask students to describe a geometric figure to someone who cannot see it; to represent algebraic equations graphically; or to write short explanations for their answers.
In science, the test will include questions that require figural responses, such as asking students to trace the circulatory system or to draw what would happen when a man's glasses fall off his face while he is running.
Such questions test students' ability to apply their knowledge to real-world situations, Ms. Mullis said.
Grant Wiggins, director of research for Consultants on Learning, Assessment, and School Structure, or class, an education firm based in Rochester, N.Y., and an expert on performance testing, praised the NAEP innovations as a good first step.
"The tasks I have seen are all headed in the right direction," he said.
However, he noted they fall short of measuring genuine performance because the tasks test student abilities outside the context of the classroom.
"Students are never given adequate time to do authentic work," he said. "You can't write an essay in 10 minutes from scratch."
In addition, Mr. Wiggins questioned whether students will perform at their best, since they and their schools have no stake in the results.
'Just a Toddler'
Mr. Finn, the governing board's chairman, acknowledged that NAEP can do more to ensure that the assessment reflects student performance, and said that the project will continue to move toward the use of performance testing.
"I'm not that proud of where NAEP is in 1990," he said. "NAEP is just a toddler heading in the direction of maturity."