New Tests Based On Performance Raise Questions

By Robert Rothman — September 12, 1990 13 min read

As educators rush to embrace performance-based assessments as alternatives to traditional multiple-choice tests, an increasingly vocal chorus is warning that the bandwagon may be traveling too fast.

Hailed as a promising method of improving the way schools measure student abilities, performance-based assessment has quickly risen to the top of policymakers’ agendas. At least half the states are developing, or say they plan to develop, such assessments as part of their testing programs, according to a survey by a University of California at Los Angeles researcher.

And the National Assessment of Educational Progress has included performance components on its 1990 tests, and plans to expand them in 1992.

In contrast to traditional tests, performance-based assessments measure students’ abilities to perform tasks, such as conduct science experiments or write essays.

Some skeptics are charging, however, that it is premature to throw out traditional tests in favor of the new method. Performance-based assessment, these critics contend, is an untried method that will cost substantially more than traditional tests but may not prove to be a better measure of student abilities.

Moreover, some suggest, advocates of performance assessments may have inflated their value by claiming that they would improve teaching and spur instruction on higher-order skills.

Performance assessment is “like ‘Star Wars': the idea remains to be demonstrated as feasible,” said Chester E. Finn Jr., professor of education and public policy at Vanderbilt University. “There are all kinds of complications people are discovering.”

Although such concerns have, in most cases, failed to stem the tide toward the new assessments, they have blocked at least one highly visible foray. After hearing objections from a National Academy of Sciences panel, the federal government declined to fund U.S. participation in an international performance assessment in mathematics and science. (See story below.)

Grant Wiggins, an assessment consultant based in Rochester, N.Y., said the problems the critics are citing arise because assessment programs are asked to do too much. In addition to measuring students’ abilities and knowledge, he said, tests are increasingly used to judge the quality of schools and school districts.

“There is no question that if we wanted to assess students for the purpose of improving the quality of instruction, we could do it tomorrow,” said Mr. Wiggins. “It’s what other countries have done for 50 years.”

“The problem is,” he added, “for a variety of reasons, most people still want a technically rigid, comparative accountability system. Demanding that simultaneously is a real technical problem.”

Dale Carlson, director of the California Assessment Program, said that, rather than pause and wait for better data on the value of performance assessments, states and districts must move ahead in developing the new tests so that researchers can analyze their strengths and weaknesses.

Whatever shortcomings the assessments may hold, he said, they pale in comparison with those of multiple-choice tests.

“The narrowing of the curriculum that is taking place, inadvertently, as a result of our not [using performance assessments],” Mr. Carlson said, “is in the long run more damaging than any statistical problems that may be occurring in the short run.”

While performance testing is not new, the idea has advanced rapidly as states and districts have stepped up their testing programs in the 1980’s.

The most widely used form of such testing, writing assessments, is currently in use in 28 states. But at least half the states have in place or are considering developing such tests in other subjects, according to Pamela R. Aschbacher, a researcher at the federally sponsored Center for Research in Evaluation, Standards, and Student Testing at ucla

Seven states--Delaware, Hawaii, Maine, Massachusetts, Michigan, New York, and North Carolina--now have such tests in at least one subject, Ms. Aschbacher found, and another six--Alaska, Arizona, California, Connecticut, New Jersey, and Vermont--are actively developing them.

Ten other states are exploring the idea or plan to do so, she found. “A number said they’d like to go in that direction, and probably will, but not for five or six years,” she said.

Districts involved in restructuring their school systems have also moved to develop performance-based assessments, according to Bruce Goldberg, co-director of the center for restructuring of the American Federation of Teachers. Last month, he said, representatives from 13 urban districts met in Philadelphia to share ideas about ways to implement the new forms of assessment.

The concept has also received a boost from the National Commission on Testing and Public Policy, a Ford Foundation-sponsored panel, and from the U.S. Congress, which asked its Office of Technology Assessment to study the idea.

And in what some observers say is a clear sign that performance assessment has gained a foothold in schools, commercial test publishers are also getting into the market. (See story, page 11.)

The growing interest in alternative assessment, proponents say, reflects a growing dissatisfaction with traditional tests. They charge that multiple-choice tests narrow the curriculum by encouraging teachers to focus on the basic skills and factual knowledge that such tests measure.

By contrast, argued Ramsay W. Selden, director of the education-assessment center for the Council of Chief State School Officers, performance-based tests foster instruction in a broader range of abilities.

“The deeper we go into cognitive learning,” he said, “the more necessary it is to come up with new ways of measuring learning, and the more limited and difficult it becomes to measure with traditional multiple-choice test formats.”

Some experts questioned, however, whether performance assessments would promote better methods of instruction.

Teachers can teach to performance assessments, just as teachers now teach to multiple-choice tests, said George F. Madaus, director of Boston College’s center for the study of testing, evaluation, and educational policy.

“A lot will depend on how they are used, just as with multiple-choice tests,” he said. “If they are used for heavy accountability, high-stakes purposes, they will have the same kinds of problems.”

Moreover, added Christopher T. Cross, assistant U.S. secretary of education for educational research and improvement, moving to performance assessments will not necessarily result in better performance by students, particularly by those who have fared poorly on traditional tests.

“People thought performance assessment would perhaps show that schools are in better shape, that kids are learning better, and that different socioeconomic groups are performing better,” said Mr. Cross. “In fact, performance assessment reflects reality.”

Mr. Finn of Vanderbilt, who preceded Mr. Cross as the Education Department’s research chief, noted that some performance measures may show wider gaps between advantaged and disadvantaged students. On the 1988 naep writing test, he pointed out, the disparity between whites and blacks increased when the test provided more time for students to complete their essays.

“If the objective is to narrow the gap between minority and majority student performance, changing the assessment cannot be counted on to do that,” Mr. Finn said. “We don’t change assessment in order to change outcomes. We do it to learn more than we can learn from previous assessments.”

Mr. Wiggins, the assessment consultant, argued that those who hope changes in testing methods will produce better outcomes are guilty of “naive expectations.”

“If one measures better what one values, one will get better information about what one values,” he said. “To be disappointed is to have silly expectations.”

He added that schools should not expect changes in assessment practices, by themselves, to improve instruction. What is needed, he said, is a comprehensive strategy that starts with raising standards for what students should know and be able to do.

“It’s wrong to say [performance assessments] were oversold; they were overbought,” Mr. Wiggins said. “It’s part of the puzzle. The restructuring argument can’t fixate on any one component.”

“You can send a message with the test,” added Mr. Selden, “but teachers still have to be shown how to teach in new ways. That has to be accomplished by professional development and re-education of teachers.”

In addition to questioning whether performance assessments are better than traditional tests, some critics also charge that they may in fact be worse.

Many districts have concluded that writing tests are not worth the additional costs in money and time required to administer and score them, said H.D. Hoover, director of the Iowa Basic Skills Testing Program.

“I wish like the dickens more people would use our writing sample more than they do. They don’t,” he said. “They are getting for individual kids a minimal amount of information for high cost.”

Despite the value of some traditional tests, “the current climate seems to be, let’s replace them with other things,” he said. “That’s a foolish idea.”

Robert L. Linn, professor of education at the University of Colorado at Boulder, pointed out that, with performance assessments, policymakers can draw few conclusions about students’ overall performance in a subject area, because such assessments measure performance on a relatively small number of tasks.

“With one or two science tasks, generalizing to science in general is risky,” he said.

The limited number of items on performance tests may also narrow the curriculum even more than multiple-choice tests do, added Mr. Finn.

“If a performance assessment in history turned out to be that everybody had to create a docudrama on the Great Depression, it’s possible everyone will spend the whole year on the Great Depression, and nothing before or since,” he said.

But Ruth Mitchell, associate director of the Council for Basic Education and the author of a forthcoming book on alternative assessments, said such concerns arise from a misconception of what performance assessments can measure.

“It’s precisely the opposite of what happens,” she said. “To do a science experiment, chosen properly, you have to use all kinds of knowledge and skills.”

Such concerns, she contended, also reflect a “misunderstanding of what learning means.”

“Learning isn’t only small bits of knowledge,” Ms. Mitchell said. ''It’s being able to apply it to a situation.”

While psychometricians have been the most vocal in raising questions about alternative assessments, parents have also voiced their concerns, noted Mr. Goldberg of the aft

The officials from the urban districts who met in Philadelphia last month, he said, all reported that parents have expressed fears that performance assessments appear to be changing the rules of the game.

“They are asking, ‘Are these exams going to get our kids into a good college?”’ Mr. Goldberg said.

Some parents, particularly members of minority groups, also regard with suspicion a new program that evaluates children’s abilities on what they consider less-than-objective standards, according to Shirley Weber, vice president of the San Diego School Board.

Speaking at a recent conference sponsored by the Panasonic Foundation, Ms. Weber said alternative assessments, such as portfolios, are “all subjective. It begins subjective, and it ends subjective.”

“That’s difficult to market to parents who don’t have a level of trust” in the school system, Ms. Weber added. “You’re going to have parents who say, ‘Is my child at the 50th percentile? Is my child at grade level? Can my child read?”’

Such concerns could lead to lawsuits against schools that use tests to determine student advancement or rewards, Mr. Finn warned.

“When testing is used for high-stakes purposes, people get uptight whether Susie Smith’s performance is judged by the same criteria as Bobby Jones’s,” he said. “When you start to withhold scholarships and drivers’ licenses [on the basis of test scores], you find people worry about the subjective, non-comparative nature of the assessments.”

Mr. Hoover of the Iowa basic-skills test said the assessment items themselves might leave schools open to charges of bias.

Because the alternative assessments ask fewer questions than multiple-choice tests, he said, they are less able to “balance out unfairness.”

“What we do is balance the nature of materials,” Mr. Hoover said. ''If there is a passage about a rural area, we will have some topic from an urban environment. The more items you can sample, the more you can balance those things out.”

“Any single item on a test is definitely biased against some kids, if it is set in a context that is interesting,” the test director added. “The more limited your selection of materials, the more likely you are to have materials that, taken alone, will be unfair to groups of students.”

D. Monty Neill, associate director of the National Center for Fair & Open Testing, or FairTest, acknowledged that performance assessments are susceptible to bias, but said they provide information to help schools curb it.

“Multiple-choice tests tell you only that this group is doing better than another group,” he said. “Portfolios suggest ways to work on the problem.”

In their pleas for caution, the critics have urged advocates of alternative assessments to provide data showing that that the new methods are technically sound measures of student abilities.

“The virtues people cite for it are fine, but they aren’t on the psychometric song list,” said Edward Haertel, professor of education at Stanford University and a member of the National Academy of Sciences’ board on international comparative studies in education.

In the proposed international study using performance assessment, he said, “the stakes are too high to be doing this kind of development work.”

Added Mr. Hoover of Iowa: “You’d better give me evidence that they are [providing] information that is more reliable and valid, and has better consequences associated with it. I haven’t seen anybody gather that information. Mostly they talk about how nice it is.”

In fact, he asserted, such tests are “less reliable and less valid. It appears to be more valid to people, but that doesn’t mean it is.”

Mr. Carlson of the California Assessment Program responded that the issue of whether performance assessments were more valid measures than traditional tests was a “red herring.”

“By and large, that’s the whole reason for moving to this kind of testing,” he said. “Multiple-choice tests only measure 30 percent of the curriculum. What do you do with the other 70 percent?”

Mr. Carlson acknowledged, however, that researchers must determine whether the alternatives are as reliable as current tests. He said that, as president of the National Council for Measurement in Education, he has appointed a task force to study such questions.

Mr. Madaus of Boston College said such problems can only be solved if more states and districts put the alternatives in place.

“You don’t solve the problems until you get involved in implementation and get data you can work with,” he said. “If you look around the country, and say where is it happening, everybody points to Connecticut and California. We don’t have a big enough sample base to work out a lot of problems.”

“I do think the problems are solvable,” he added, “but they are certainly out there.”

Without such evidence, Mr. Madaus cautioned, performance assessment could “become a fad like creativity was 10 or 15 years ago, or the way computer-based instruction was a decade ago.”

“The real danger,” he said, “is that this becomes one of those, rather than substantive. I hope it becomes substantive.”

Perhaps a bigger worry, suggested Mr. Hoover, is that the pendulum will swing back and schools will place even more emphasis than they do now on traditional tests.

“I worry that if we overrely on something, we’ll jump in the other direction,” he said. “In the testing movement, if in fact performance assessment in most cases is promising more than it can deliver, [advocates] are going to kill the very thing they are for.”

“The biggest danger of the bandwagon effect,” Mr. Hoover warned, “is that we’ll end up right back where we are--with too much emphasis on [tests like] the ITBS.”

A version of this article appeared in the September 12, 1990 edition of Education Week as New Tests Based On Performance Raise Questions