Performance-Based Assessment Gains Prominent Place on Research Docket
Spurred in part by strong interest from educators and policymakers, researchers have turned the study of alternatives to traditional standardized tests into one of the hottest topics in their field, and their work has begun to bear fruit.
Over the past few years, more and more states and districts have added performance-based assessments and portfolios to their testing repertoires amid criticism that such methods were untried and their consequences uncertain.
But in recent months, a host of scientists--many assisted by the federal government--has begun to develop an understanding of the new instruments, researchers say. Although much of the work is preliminary, the studies have found some promise--as well as some potential pitfalls--in the alternative forms of assessment.
One much-anticipated study, slated to be released next week, is expected to show both the benefits and drawbacks of the new methods. In an analysis of Vermont's pioneering portfolio assessment, researchers from the RAND Corporation are said to have found that the program has improved instruction in writing and mathematics, but that the state may not yet be able to use the data as a measure of student performance in those subjects.
Performance assessment "is not a magic bullet that will solve all the problems of schools,'' said Stephen P. Klein, a senior researcher at RAND, who is not involved in the Vermont study. "But I wouldn't say it's a false chase.''
Mr. Klein and others caution, however, that policymakers' interest in the new tools continues to outpace the scholarship. And, they warn, the movement may collapse without a firm research base.
"We're trying to fly the airplane and fix it at the same time,'' said Eva L. Baker, a co-director of the center for research on evaluation, standards, and student testing at the University of California at Los Angeles, which held a recent conference on the topic.
"The real problem,'' she said, "is how to keep the thing alive long enough so reasonable data can be obtained, and not have people jump to conclusions that it's the greatest thing, or that it's been oversold.''
Ideas From Research
Hailed as one of the most promising school-reform strategies, performance-based and portfolio assessment has quickly risen to the top of the educational agenda in the past few years.
In contrast to traditional tests, the alternatives measure students' abilities to write essays, conduct science experiments, and construct and solve math problems, rather than simply answer multiple-choice or short-answer questions. Advocates of the assessments maintain that they can improve instruction by encouraging teachers to focus on higher-order skills, rather than memorization and rote drill.
At least 30 states have implemented such techniques as part of their testing programs, according to a study by the Pittsburgh school district, and others are developing them or are planning to do so. In addition, the New Standards Project, a partnership involving 17 states and six districts, has piloted a national examination system that is completely performance-based.
But while policymakers have embraced the new instruments, researchers have tried to raise caution flags. They have stressed that performance-based assessments will cost considerably more than conventional tests but have not proved themselves better methods of measuring student performance. (See Education Week, Sept. 12, 1990.)
In 1990, in fact, Ms. Baker conducted an extensive search of the literature on performance assessment and found almost no data.
Such findings are ironic, since the concept of measuring student performance through alternative techniques grew out of research and practice, not out of the policy community, Lorraine M. McDonnell, a professor of political science at the University of California at Santa Barbara, observed.
"It was not elected officials who dreamt up performance assessment and portfolios,'' she said. "The ideas came from research and education reform.''
Effects of Current Tests
Specifically, cognitive scientists seeking ways to change curricula and instruction to enhance student learning embraced assessment reform as a powerful lever, said John R. Frederiksen, the director of the cognitive-sciences-research center in the Educational Testing Service's San Francisco Bay-area office.
"A lot of people in cognitive science have come to the conclusion that it is wonderful to work on pieces of cognition, but, at a certain point, one believes enough in one's ideas that one wants to do something to bring about change,'' he said. "There was a migration into the notion that assessment is the key place to begin.''
At the same time, said George F. Madaus, the director of the center for the study of testing, evaluation, and educational policy at Boston College, researchers have amassed evidence of the harmful effects of traditional tests.
In a recently released study funded by the National Science Foundation, for example, Mr. Madaus found that the six major standardized achievement tests, as well as the tests contained in textbooks, drive instruction toward low-level knowledge and skills, rather than the higher-order abilities advocated by reformers. (See Education Week, Oct. 21, 1992.)
"We've got to get something better than what we've got,'' Mr. Madaus said. "They certainly aren't measuring what the National Council of Teachers of Mathematics is calling for, what the science community is calling for.''
'In for the Long Haul'
Responding to such concerns, researchers have stepped up their efforts to study the new forms of assessment.
Federal agencies have also pitched in to support such projects. The U.C.L.A. research center on testing, which is sponsored by the U.S. Education Department, has seen its budget increase by 2 1/2 times over the past three years, according to Ms. Baker.
But one Education Department study of the topic was canceled last month after funding for it dried up. The department had awarded a grant to Pelavin Associates, a Washington-based firm, to study the effects of new assessments on schools as part of an evaluation of school reforms. But the department was forced to cancel the program when funding for it was not included in the department's fiscal 1993 budget.
In addition to the Education Department projects, the N.S.F. last year launched a $4 million project to fund studies on alternative assessments in math and science.
Francis X. Sutman, a program director in the science foundation's directorate for education and human resources, said the agency "is in this for the long haul.''
"We are not going to be able to solve problems and give solutions immediately,'' he said. "But there has to be a research base upon which the movement can build.''
Close Touch With Teachers
In addition to the federally backed projects, the National Board for Professional Teaching Standards has sponsored research on performance assessment to be used in assessing teachers for its certificates.
And many states and districts that have put new assessments in place have also conducted or sponsored evaluations and studies of them. For example, researchers from the Pittsburgh public schools collected and analyzed a wealth of data as part of the district's effort to evaluate student writing on the basis of portfolios. (See Education Week, Aug. 5, 1992.)
Ms. Baker of U.C.L.A. noted that studies of performance assessment have brought researchers in close touch with teachers and may help close the traditional gap between research and practice.
"Because of the nature of the questions we are asking,'' she said, "the research has to be embedded in practice--with teachers, rather than exclusively using teachers as sites or subjects for us.''
"Years ago,'' she added, "education researchers used to work on solutions to problems no one had. Now, problems are coming out of classrooms, and people are ready for collaboration.''
Uses and Limitations
The work that has been conducted thus far has unearthed evidence that alternative forms of assessment can be used to measure student and teacher performance.
Studies conducted for the teaching-standards board, which have examined the use of "video portfolios'' of teachers' work, have found that such techniques "can be used reliably to score teacher performance in a variety of settings,'' said Joan C. Baratz-Snowden, the board's vice president for assessment development. "That's a major breakthrough.''
But the RAND study of the Vermont portfolios is expected to show that the problem of coming up with reliable scores has not been completely solved.
And other studies have also uncovered limitations in the alternative assessments.
Noreen Webb, a professor of educational psychology at the U.C.L.A. graduate school of education, found that assessing students in groups cannot be used to judge individual students' performance. Performance-assessment advocates have cited as one advantage the fact that such methods could, unlike traditional tests, be used to gauge abilities to work in groups.
Working with 7th graders from a Los Angeles middle school, Ms. Webb asked students to work in groups to solve certain math problems, such as calculating the cost of a long-distance telephone call. Weeks later, when she and her colleagues tested the students individually on the same problems, she found that many students, who were able to solve the problems as part of a group, could not do so on their own.
"Should we throw group assessment out the window? No, of course not,'' Ms. Webb said. "But we can't use group-assessment scores to shed light on what students could accomplish by themselves.''
How Many Tasks?
In a separate set of studies, Richard C. Shavelson, the dean of the school of education at the University of California at Santa Barbara, pointed up a potentially more serious issue: the number of tasks required to gauge a student's performance.
Examining middle school students who conducted a series of science experiments, such as determining which paper towels are most absorbent, Mr. Shavelson found that each task measured only a fraction of students' abilities in science. Over all, he found, the assessment needed to include as many as 10 tasks--a formidable number, given the amount of time each task takes--in order to gauge science performance.
But Ms. Baker of U.C.L.A. said she and her colleagues may have come up with a solution to that problem.
Studying performance assessments in history, such as asking students to write essays demonstrating their understanding of the Lincoln-Douglas debates, Ms. Baker found that student performance could be evaluated in as few as three tasks by specifying at the outset the skills and knowledge that are to be assessed.
The problem that Mr. Shavelson and others found, she said, arose because teachers scoring the assessments came up with the criteria for evaluating performance after the tasks were administered.
Ms. Baratz-Snowden said researchers at the teacher-standards board have come up with similar findings.
"If video is used to find out if accomplished teaching is going on, you need a lot of video,'' she said. "But if you want to find out if teachers can do certain things--and you specify up front the things you are looking for--you don't need as much.''
In addition to examining the designs of the assessments, researchers have begun to learn about the problems in implementing them.
Daniel M. Koretz, a senior social scientist at the RAND Corporation, said that the study of Vermont's portfolio assessment found that it is time-consuming and costly to put in place. However, teachers and principals said they considered the program a "worthwhile burden.'' (See Education Week, Sept. 9, 1992.)
Mr. Madaus of Boston College, who has studied the performance-based-assessment system in Britain, said that country's system has pointed up the difficulty of comparing the results of the new assessments.
The British education secretary, he noted, has said standards are declining because the number of students passing the examinations has increased over the past four years. The only way to determine if he is correct, Mr. Madaus said, would be to equate the different years' exams, which is "a tough thing to do.''
"One issue keeps coming up,'' Mr. Madaus said. "How do you build a bunch of tests, all purporting to measure the same things, and equate them? That hasn't yet been solved.''
Questions of Equity
Other researchers have pointed out that implementing performance assessments and portfolios also raises questions of fairness and equity.
Mr. Shavelson of the University of California at Santa Barbara, for example, noted that his studies have found that the assessments are "extraordinarily curriculum-sensitive,'' which puts students from disadvantaged areas, who have had less access to hands-on science instruction, at a disadvantage.
"If you had not had access to the curriculum, you're not going to look pretty good'' on the assessment, Mr. Shavelson said.
Similarly, said Dennie Palmer Wolf, a senior research associate at the Harvard University graduate school of education, portfolios tend to offer an advantage to students who are relatively fluent in English.
"Until we break the stranglehold of language on portfolios and open them up,'' said Ms. Wolf, who is studying the implementation of portfolios in middle schools in four cities, "we will again have in portfolios just a different sort of sorting method.''
In fact, said H.D. Hoover, the director of the Iowa Basic Skills Testing Program, the literature on performance assessment suggests that the gaps between advantaged and disadvantaged students are larger on such measures than they are for multiple-choice tests.
"People say about standardized tests, 'If different groups perform differently, there is cultural bias,' '' Mr. Hoover said. "If you buy that argument, most performance assessments to date are more biased than standardized tests.''
But despite such problems, researchers have also found that implementing performance assessments and portfolios has paid dividends for schools.
For one thing, Ms. Wolf said, the new methods, as advocates had expected, have indeed tapped knowledge and skills that traditional tests--and most schools--often miss.
"Students are capable of extraordinary work,'' she said. "Schools do not require extraordinary work.''
And Daniel P. Resnick, the director of research for the New Standards Project, also noted that teachers and students in that project have demonstrated tremendous enthusiasm for the new instruments. In addition to conducting technical studies of validity, reliability, and fairness, Mr. Resnick said, researchers should conduct case studies that show the "pleasure and fun'' that teachers and students experience.
"Research has to look at the enthusiasm, as well as the hard side,'' he said.
In addition to studying the use of alternative assessments in large-scale testing programs, researchers have explored improving teachers' use of the tools in their own classrooms.
In many ways, said Thomas A. Romberg, the director of the national center for research in mathematical-sciences education at the University of Wisconsin at Madison, the classroom assessments may be more important than the large-scale programs.
Although advocates of alternative assessments at the national level argue that changing assessment will change instruction, Mr. Romberg said, "That's wishful thinking.''
"Real changes are largely dependent on teachers,'' he said. "If they don't believe in and understand the direction we are going, the rest of it will not happen.''
In his work, Mr. Romberg is developing ways to help teachers make judgments about student performance.
"The argument we make,'' he said, "is that teachers have been de-skilled to such an extent that they don't believe their judgment has value.''
In another set of studies, Maryl Gearhart, a project director at the center for the study of evaluation at U.C.L.A., is working to improve teachers' use of portfolios to gauge student performance in writing and math.
In early tests of writing portfolios, she noted, teachers generally failed to provide enough guidance to enable students to make genuine assessments of their own work. As a result, she said, students seldom made substantive revisions, or held their work to high standards.
Mr. Sutman of the N.S.F. also argued that more work is needed to ensure that alternative assessments in math and science test students' knowledge of content in those subjects. In preliminary efforts, he said, teachers have developed items that were "content-void or -weak, or have content errors in them.''
No Longer Seen as 'Difficult'
While most researchers acknowledge that there is much work left to do, the progress so far has convinced at least one former skeptic of the value of alternative assessments.
"When I started here,'' said Ms. Baratz-Snowden of the teacher-standards board, "I said, 'I can't see myself with a terrarium rotting on a radiator while there is a lawsuit [from a candidate denied a certificate based on a portfolio assessment].' That was my view of portfolio assessment.''
"But I no longer see it as difficult,'' she continued. "I see it, in
fact, as a critical element in helping professional-development
activities, and in substantiating the standards. I think it's