|Keeping collections of student work is fine—but unreliable scoring makes the practice anything but effective as an assessment tool.|
It's no secret that public schools today lack credibility. We sail from one fad to the next on a tidal wave of jargon, trailing the wreckage of our latest "instructional innovation" behind us, and then wonder why regular people find it tough to believe we know what we're doing.
Take portfolios, for instance. Here in Vermont they tell us we're "leading the nation." If so, the news isn't good for the rest of the country.
What exactly are portfolios? They're folders where kids keep their work. If this sounds like less than a breakthrough, that's only because it is less than a breakthrough.
Not that there's anything wrong with keeping portfolios. Plato probably carried one to Socrates' classroom. But advocates today promote portfolios as "accurate measures of educational performance," offering "credible," "standardized" data. Unfortunately, portfolio scoring simply doesn't work.
Portfolios are scored according to rubrics, which detail theoretically "objective" standards and criteria. In Vermont, for example, writing is judged in five "dimensions"—purpose, organization, details, voice, and grammar, usage, and mechanics—and awarded one of four scores in each category: Extensively, Frequently, Sometimes, or Rarely. These scores translate into a four-point scale. Innovators naturally insist that portfolio 4.0's and 3.0's have nothing to do with old-fashioned 4.0's and 3.0's, also known as A's and B's.
Suppose you're trying to rate the details in a piece of writing. If you think the details are "explicit," you score it Extensively. If they're only "elaborated," of course, then you just give it a Frequently. Rating an essay's organization requires that you distinguish between problems that "affect unity or coherence"—these rate a Sometimes—and those that "make writing difficult to follow"—a Rarely. Scoring the voice used in writing is even easier. All you have to do is detect the difference between a "distinctive" tone, an "effective" tone, an "attempt" at an "appropriate" tone, and an "appropriate" tone that's "not evident." Next you can decide if the piece contains errors that "may distract the reader" or errors that "interfere with understanding."
This is "objective"?
By the way, portfolio backers promote all this on the grounds that parents like the idea of clearly defined standards. Unfortunately, most teachers—and most regular people—have a hard time deciding with statistical reliability whether they're being "distracted" or whether their understanding is being "interfered" with.
In order to "calibrate" their judgment with that of the "experts," teachers score sets of writing samples called benchmarks. Inconveniently, Vermont teachers statewide traditionally disagree with experts' benchmark scores more often than they agree with them. Even the "experts" commonly disagree with each other. Which scores are the right scores? You tell me.
In an effort to eliminate the benchmark warfare that has plagued Vermont's scoring system from the start, officials introduced "line pieces." These new samples were supposed to indisputably define the absolute best a kid's composition could be before it deserved the next higher score. Unfortunately, this remedy rested on the breathtaking assumption that raters previously unable to agree on the difference between an "interference" and a "distraction" would somehow be able to agree on the difference between the very best "interference" and the very worst "distraction." Not surprisingly, line pieces haven't helped clear things up.
As a result, portfolio scoring remains notoriously unreliable. Even judges rating the same paper disagree on the score it deserves. The RAND Corp., the state's former assessment consultant, found initial Vermont scoring reliability "so low that most planned uses of data had to be abandoned." RAND described scattered improvements in succeeding years as "slight" and "trivial," with some reliability figures actually declining and "large margins of error" consistently contaminating data. RAND's final report cited only "limited evidence from other programs that reliable scoring of writing portfolios is practical."
Portfolio debacles aren't limited to Vermont. A report commissioned by Kentucky's legislature blasted that state's portfolio-based system as "seriously flawed," concluding that "the public is being misinformed" about statewide results and "misled about the accomplishments of individual students." Portfolio programs are also proving far costlier than anticipated. In the words of one RAND analyst, the "early infatuation" with portfolios was "unrealistic."
Enthusiasts frequently exaggerate the role portfolios themselves play in learning. It's wrong to credit portfolios with the benefits delivered by sound writing and problem-solving strategies simply because kids employ these techniques to complete the work for a portfolio. And it's certainly wrong to misrepresent teacher support for keeping portfolios as an endorsement of portfolios as an assessment tool. You can use portfolios in your classroom without ever engaging in the pseudo-objective scoring, the voluminous record keeping, and the ghastly expense in both money and classroom time.
Portfolios were devised to provide "meaningful, useful data." They don't. That's why they should be abandoned as assessment tools. The last thing our schools need is more sound and fury signifying nothing. Addiction to fashion and blindness to folly won't build anyone's confidence in public education.