Even as Popularity Soars, Portfolios Encounter Roadblocks
This is the second story in an occasional series that will examine trends in assessment and new ways of measuring what students know and are able to do.
Even if portfolios were outlawed, Dwight Cooley would still use them to evaluate his students' academic progress.
The Fort Worth teacher said the portfolios collections of specific types of student work allow his 5th graders to see how far they have come and help inspire self-confidence. They give students a voice in their own learning and provide parents a portrait of their children's progress.
Moreover, Mr. Cooley said, they force teachers like himself to teach in ways that encourage students to produce the kind of work that reflects well in their portfolios.
"I've seen what it does for students, and I've seen what it does for teachers," he said.
Mr. Cooley's thinking explains in large part why portfolios have become such a popular means of assessment in recent years. Thought to paint a truer picture of student achievement than traditional tests, portfolios are a key to the New Standards project, a consortium of some of the nation's most reform-minded states and school districts that is working to build a national system of standards and assessments.
At least two states--Kentucky and Vermont--have made portfolios a part of their statewide assessment programs. And numerous other teachers and school districts are experimenting with portfolio methodologies.
But testing experts say that while portfolios are a good instructional tool, they have yet to live up to their immense promise as a means of measuring student progress on a large scale.
"There are a variety of reasons people advocate portfolios and, depending on the purpose, they are either working well or not working well," said Edward Roeber, the director of student-assessment programs for the Council of Chief State School Officers. "Some people say portfolios can replace all assessments, and I think that's an unrealistic expectation."
Robert Calfee, a professor of education at Stanford University, agrees.
"There's evidence from a lot of states that externally mandated moves are going to run out of steam simply because it takes too much time and money for what you get," he said.
Apples and Oranges
Part of what is wrong with large-scale portfolio-assessment systems has to do with what is right about them: Like every student, every portfolio is different.
Unlike standardized tests, which might call for students to produce a piece of writing in response to a specific prompt within a set time period, portfolios might contain a diverse array of work products--a poem here, for example, a narrative essay there.
In Vermont, where the state's developing assessment system centers on the portfolio, students are asked to submit their "best" pieces of writing in specific genres. In Pittsburgh, students choose their "most important," "satisfactory," and "unsatisfactory" pieces of work as well as evidence of the revisions they have made.
"It's hard to judge who is the better writer when you're looking at completely different pieces of evidence," Mr. Roeber said.
Mathematics portfolios, like the ones Mr. Cooley's students produce through the New Standards project, might contain descriptions of a survey that students have designed and conducted themselves. There might be a written explanation showing how a student solved a multistep word problem. Mr. Cooley, for example, asks his students how much wood or wire fencing they would need to enclose an outdoor pool area.
Guidelines for what goes in portfolios grow out of scoring "rubrics" or criteria that usually are devised by teachers and testing experts working together. In Vermont, for example, the five to seven best pieces in students' math portfolios are each scored on a four-point scale, three of which are meant to gauge how well students solve problems and communicate mathematically.
In most of the portfolio-assessment experiments so far, the job of scoring portfolios usually falls to classroom teachers who have been trained to some degree in the use of the scoring rubrics. All of this makes evaluating portfolio work a hugely complicated task.
Despite seven years of development and evaluation, writing portfolios in Vermont, for example, have only recently begun producing scores judged reliable enough to be reported below the state level.
And a report released this winter on Kentucky's assessment program warned that the pioneering system, which uses both portfolios and performance tasks to gauge student achievement, may not yet be reliable enough to use as a basis for rewarding and punishing schools.
Coming to Agreement
The problem in Vermont has been what experts call "interrater reliability"--the consistency with which different readers assign the same scores to a single portfolio. A machine-scored, fill-in-the-bubble test, for example, is 100 percent reliable. But reliability decreases as scoring becomes more a matter of human judgment.
In Vermont, the interrater reliability on total scores for students' 8th-grade writing portfolios, for example, increased from 0.60 in 1992 to 0.69 last year. That is on a scale of 0.0 to 1.0, with 0.0 signifying that raters did not agree at all and 1.0 meaning that they were in perfect agreement.
The state, however, has had better luck with mathematics portfolios. The reliability factor in that subject has increased, at the 8th-grade level, from 0.53 to 0.83.
A portfolio project in Pittsburgh, on the other hand, has achieved higher agreement with writing portfolios. That school district has been experimenting with portfolios as an outgrowth of Arts Propel, a research project exploring the assessment of learning in imaginative writing, music, and the visual arts.
"I think it had to do with the fact that we spent so much more time looking at kids' writing and coming to agreement on what constituted quality of writing," said Drew H. Gitomer, a senior research scientist at the Educational Testing Service who consulted on the Pittsburgh project. "There has to be a shared understanding."
"It's difficult," he said, "but it's not impossible."
In Vermont and Pittsburgh, however, the stakes attached to portfolio scores are low. The scores are not used, as they will one day be in Kentucky, as a basis for rewarding or punishing schools. And experts say no one has yet determined whether there is any magic number for achieving agreement when stakes are high.
Even if raters agree on scores, however, there is no guarantee that portfolios measure what they are supposed to measure.
For example, how do scorers determine "whose work is being judged?" asked Maryl Gearhart, a project director at the University of California at Los Angeles's Center for Research, Evaluation, Standards, and Student Testing.
"Portfolio contents are drawn from the life of the classroom, and the student's work may be labeled with the student's name, but the work was produced in a social context that is very unlikely to be consistent from student to student and from classroom to classroom," she said.
Ms. Gearhart and her colleagues asked nine elementary school teachers to rate the amount of help they gave students who were producing writing assignments to include in portfolios. They found that teachers varied in the support they gave, often giving the most help to students who needed it the most.
Variation in the amount of help students received, the number of opportunities students had to revise their work, and the amount of collaboration that went on in producing the assignments also turned up in evaluations of Vermont's portfolio-assessment program.
"The irony is that the more engaged teachers are in instructional reforms that call for more classroom collaboration, the thornier this issue is," Ms. Gearhart said.
To teachers like Mr. Cooley, however, such a concern is a nonissue.
"What's the big deal with giving them time to work on it?" he said. "If I was giving them a task, and their first draft was a 2 [on a scale of 1 to 4] that might give me ideas about where my instruction should go. Maybe it means I should give [them] another shot at it."
But other studies also have raised questions about the validity of portfolio scores.
As part of the 1992 National Assessment of Educational Progress in writing, for example, researchers looked to see whether there was a statistical relationship between students' portfolio scores and their scores on the regular writing assessment, in which students have 25 to 50 minutes to write in response to a prompt.
The relationship was little more than chance.
"You'd get better agreement if you coordinated the writing assessment with a normal, multiple-choice test of social studies," said Daniel M. Koretz, a resident scholar at the RAND Institute on Education and Training in Washington.
Some experts suggest that such a result could mean that the two assessments were testing different aspects of writing skill, but Mr. Koretz disagrees.
"My guess is that it reflects more badly on portfolios," he said. "If it's an on-demand test, at least you know they're not getting any help."
"If this kid was applying for a newspaper job, what you would want to know is 'Can this kid write?'" he continued. "Not 'Did this kid manage to assemble a nice portfolio with a lot of help from his mother?'"
On the other hand, some of these same critics say, portfolio assessments can be a powerful means of changing instruction at the classroom level.
Mr. Koretz and his colleagues at RAND surveyed principals and teachers in 80 Vermont schools that took part in the portfolio program. Of the math teachers in the sample, more than half said that, as a result of portfolio assessments, they had devoted more time to teaching problem-solving and communication in their classes. Three-fourths reported having students spend more time applying math knowledge to new situations; roughly 70 percent devoted more class time to making charts, graphs, and diagrams; and a similar proportion allocated more class time to writing reports about math.
Increases also were found in the amount of time students spent working in pairs or small groups.
"We recognize the reliability of portfolio scores is lower," said Brian Gong, who is a consultant to the state education department in Kentucky, where performance assessments brought about similar changes. "We're willing to trade off technical measurement quality for improved instruction."
'A Worthwhile Burden'
There is little question, though, that portfolios are more expensive and time-consuming than standardized tests. A math portfolio in Kentucky, for example, takes an hour to grade, Mr. Gong said. In Vermont in 1993, more than 160 teacher-raters spent five days evaluating fewer than 7,000 portfolios. Add to that, experts say, the time spent training teachers, collecting materials in the classroom, and discussing with students what to include in their portfolios--some of which might also be chalked up to instructional time.
In Vermont, at least, Mr. Koretz's studies have found that teachers consider that extra effort "a worthwhile burden."
"This is new," said Richard P. Mills, the education commissioner in Vermont. "We have only three years of data, and we're getting better." But he noted that 75 percent of the schools that use the portfolio assessments, which are not mandated there, have expanded their use beyond the 4th and 8th grades. And virtually all school districts are using them.
"I think it would be almost impossible to stamp it out," he said.
The "Review Session" series is made possible by a grant from the John D. and Catherine T. MacArthur Foundation.