Lost in a Haze of Pseudo-Objective Scoring, We've Forgotten Why We Keep Students' Work
It's no secret that public schools today lack credibility. We sail from one fad to the next on a tidal wave of jargon, trailing the wreckage of our latest "instructional innovation" behind us, and then we wonder why regular people find it tough to believe we know what we're doing.
Take portfolios, for instance. Here in Vermont they tell us we're "leading the nation." If so, the news isn't good for the rest of the country.
What exactly are portfolios? They're folders where kids keep their work. If this sounds like less than an education breakthrough, that's only because it is less than an education breakthrough.
Not that there's anything wrong with keeping portfolios. Plato probably carried one to Socrates' classroom. But boosters today promote portfolios as "accurate measures of educational performance," offering "credible," "standardized" data. Unfortunately, portfolio scoring simply doesn't work.
Portfolios are scored according to rubrics, which detail theoretically "objective" standards and criteria. In Vermont, for example, writing is judged in five "dimensions"--Purpose, Organization, Details, Voice, and Grammar, Usage, and Mechanics--and awarded one of four scores in each category--Extensively, Frequently, Sometimes, or Rarely. These scores translate into a four-point scale. Innovators naturally insist that portfolio 4.0's and 3.0's have nothing to do with old fashioned 4.0's and 3.0's--also known as A's and B's.
In the effort to improve student assessments, have
educators lost sight of the purpose of good tests? Share your views
in our on-line Town Meeting.
Suppose you're trying to rate a piece's Details. If you think the details are "explicit," you score it an Extensively. If they're only "elaborated," of course, then you just give it a Frequently. Rating Organization requires that you distinguish between problems that "affect unity or coherence"--a Sometimes--and those that "make writing difficult to follow"--a Rarely. Scoring Voice is even easier. All you have to do is detect the difference between a "distinctive" tone, an "effective" tone, an "attempt" at an "appropriate" tone, and an "appropriate" tone that's "not evident." Next you can decide if the piece contains errors that "may distract the reader," or errors that "interfere with understanding."
This is "objective"?
By the way, portfolio folks promote all this on the grounds that parents "like the idea of clearly defined standards." Unfortunately, most teachers--and most regular people--have a hard time deciding, with statistical reliability, whether they're being "distracted" or their understanding is being "interfered" with.
In order to "calibrate" their judgment with that of the "experts," teachers score sets of writing samples called benchmarks. Inconveniently, Vermont teachers statewide traditionally disagree with experts' benchmark scores more often than they agree with them. Even the "experts" commonly disagree with each other. Which scores are the "right" scores? You tell me.
In an effort to eliminate the benchmark warfare that's plagued scoring from the start, officials introduced "line pieces." These new samples were supposed to indisputably define the absolute best a kid's composition could be before it deserved the next higher score.
Unfortunately, this remedy rested on the breathtaking assumption that raters previously unable to agree on the difference between an "interference" and a "distraction" would somehow be able to agree on the difference between the very best "interference" and the very worst "distraction." Not surprisingly, line pieces haven't helped clear things up.
As a result, portfolio scoring remains notoriously unreliable. This means raters judging the same papers typically can't agree with each other. The RAND Corp.--the state's former assessment consultant--found initial Vermont scoring reliability "so low that most planned uses of data had to be abandoned." RAND described scattered improvements in succeeding years as "slight" and "trivial," with some reliability figures actually declining. "Large margins of error" have consistently contaminated data. RAND's final report cited only "limited evidence from other programs that reliable scoring of writing portfolios is practical."
According to the U.S. Department of Education and an array of testing specialists, portfolios also commonly suffer from validity problems. "Even perfectly reliable scoring," they report, wouldn't guarantee that portfolios are measuring the skills they're intended to assess.
Portfolio debacles aren't limited to Vermont. A report commissioned by Kentucky's legislature blasted that state's portfolio-based system as "seriously flawed," concluding that "the public is being misinformed" about statewide results and "misled about the accomplishments of individual students." Portfolio programs are also proving far costlier than anticipated. In the words of one RAND analyst, the "early infatuation" with portfolios was "unrealistic."
Enthusiasts frequently exaggerate the role portfolios themselves play in learning. It's wrong to credit portfolios with the benefits delivered by the writing process and sound problem-solving strategies simply because kids employ these techniques to complete the work you find in their portfolios.
It's certainly wrong to misrepresent teacher support for portfolio keeping as an endorsement of portfolio assessment. You can use portfolios in your classroom with your kids without ever engaging in the pseudo-objective scoring, the voluminous record keeping, and the ghastly expense in both money and classroom time.
Portfolios were devised to provide "meaningful, useful data." They don't. That's why as assessment tools they should be abandoned.
The last thing our schools need is more sound and fury, signifying nothing. Addiction to fashion and blindness to folly won't build anyone's confidence in public education.
Vol. 17, Issue 18, Page 76Published in Print: January 14, 1998, as Portfolio Folly