RAND Urges Overhaul in Vt.'s Pioneering Writing Test

Article Tools
  • PrintPrinter-Friendly
  • EmailEmail Article
  • ReprintReprints
  • CommentsComments

A report on Vermont's pioneering assessment program has found that the state has improved the reliability of the mathematics portion of the assessment but that fundamental problems remain in the way students are assessed in writing.

The Vermont system, which is being closely watched by educators around the country, is the first statewide assessment program to measure student achievement in part on the basis of portfolios.

A 1992 report by the RAND Corporation on the program's first year of wide-scale implementation found that the "rater reliability'' in scoring the portfolios--the extent to which different scorers agreed about the quality of the same student's work--was very low. As a result, state education officials temporarily abandoned plans to report the results at the levels of schools or groups of school districts and made changes in the testing program. (See Education Week, Jan. 13, 1993.)

In a report released last week, however, RAND researchers say that rater reliability remained low only in the writing assessment. But the flaws were serious enough, they contend, that the state should take steps to overhaul that part of the assessment system.

"In our opinion, it is unrealistic to expect a substantial rate of improvement in the reliability of the writing-portfolio scores unless the program is changed fundamentally,'' the report says.

The math assessment, by comparison, achieved a "moderately high'' level of rater reliability.

"This shows that we can do it,'' Commissioner of Education Richard P. Mills said. "If you focus on the problem and concentrate your resources on training, portfolio assessments can meet acceptable levels of reliability.''

Scoring 'Best Pieces'

Created in 1988, the assessment system is Vermont's first statewide testing program. It has drawn national attention at a time when a number of states and school districts are considering alternatives to traditional multiple-choice tests.

Under the program, 4th and 8th graders are assessed in writing and math in two ways. They are given a uniform test, which is a standardized test that includes multiple-choice questions and a longer, open-ended question or task, and they are asked to put together portfolios of their work that either include a set number of "best pieces'' in a specific area or a combination of best pieces and other samples of work completed throughout the year.

Unlike traditional tests, which are scored by computers, the portfolios were scored by teachers who looked for specific criteria and evaluated how the student's work met those criteria on a four-point scale.

In writing, for example, teachers evaluated the portfolios for purpose; organization; detail; voice or tone; and grammar, usage, and mechanics. Portfolios were given a score of 1.00, for example, if the compositions established a clear purpose "rarely'' and a score of 4.00 if they did so "extensively.''

The RAND study used a standard statistical measure known as a reliability coefficient, which measures the extent to which two raters rank a student's work the same. Under such a measure, no agreement would be zero, while total agreement would be 1.00.

'Low by Any Standards'

The reliability of the scores for the writing portfolio was 0.56 for 4th grade and 0.63 for 8th grade, which, the report says, was only a slight improvement over the previous year and "low by any standards.''

For the math portfolios, the reliability scores ranged from 0.72 to 0.79, which are roughly comparable to those for some National Assessment of Educational Progress tests.

"You can expect only gradual improvement in a program of this complexity,'' said Daniel M. Koretz, a resident scholar at RAND's Institute on Education and Training and the primary author of the study.

"One of the problems of the reform movement, outside of Vermont, is that the public and the policy world are developing unrealistic expectations of how long it will take these things to work,'' he said.

Mr. Koretz said the improving reliability in math came about in part because much of the portfolio scoring was conducted at a single site over five days. Raters also participated in calibration sessions twice a day in which they rated pre-scored pieces and discussed disagreements.

That the same strategy did not work for the writing portfolio suggests more basic problems in its design, he said. The report says part of the assessment could be improved by making the scoring rubrics simpler and more specific to the genre of writing being evaluated.

In addition, the categories of writing samples that students must include in their portfolios could be narrowed, the report says. Students currently are asked to submit one piece that is either a poem, short story, or personal narrative--three very different genres.

State education officials were to meet late last week to consider those and other recommendations for improving the writing assessment.

Last week's study focused only on rater reliability. A forthcoming RAND report that places Vermont's fledgling program in a broader context also points out that the new system is bringing about changes in the kind of instruction Vermont students are getting--a major goal of the new assessment system.

'Useful Information'

Virtually every 4th- and 8th-grade teacher in the state has had state-sponsored training in the use of the portfolios. And teachers say they are devoting more time to problem-solving and communication in teaching math, among other changes.

The improving reliability in math also allows state officials to turn to other issues stemming from the new system. Under the current system, for example, some students may get more opportunities than others to revise the work they submit in their portfolios. State officials may have to figure out how to account for such differences, Mr. Koretz said.

For now, Mr. Mills suggested, the results of the assessment are already providing "useful information about student performance against clear standards.''

"Even in writing, in 90 percent of the cases, the disagreement between levels was off by only one point,'' he said.

Over all, Mr. Mills said, the results show that Vermont students were performing at "novice levels'' in both writing and in math.

The results from portfolio assessments in both subjects were released for the first time this year at the level of supervisory unions, which are groups of school districts. School officials also received copies of the RAND study.

Vol. 13, Issue 10

Notice: We recently upgraded our comments. (Learn more here.) If you are logged in as a subscriber or registered user and already have a Display Name on edweek.org, you can post comments. If you do not already have a Display Name, please create one here.
Ground Rules for Posting
We encourage lively debate, but please be respectful of others. Profanity and personal attacks are prohibited. By commenting, you are agreeing to abide by our user agreement.
All comments are public.

Back to Top Back to Top

Most Popular Stories