Responding quickly to a report that found serious problems in the state’s pioneering portfolio-assessment program, Vermont officials are redesigning the system to improve its technical quality.
The report, issued last month by the RAND Corporation, found that rater reliability--the level of agreement between raters about the quality of a student’s work--was low. The study recommended that only state-level results--which it found were reliable--be reported to the public. (See Education Week, Dec. 16, 1992.)
Vermont is the only state to measure student performance in part on the basis of portfolios.
In the weeks since the report was released, officials from the state education department have met with RAND researchers and teachers to discuss changes that might boost reliability, and have agreed to a number of revisions.
Among other changes, the officials have decided to revamp the way teachers are trained to score portfolios, and to select a group of highly trained teachers to generate state and regional data.
Susan L. Rigney, the director of the assessment program in the department, predicted that the changes would “substantially alleviate’’ the problems identified in the RAND report.
But Commissioner of Education Richard P. Mills also noted that improving the reliability of the assessment would take time. He suggested that, while the state may report results next year at the level of supervisory unions, or groups of school districts, it probably will not be able to report the results from each school’s portfolios.
“One lesson is clear from the RAND study: This is new,’' Mr. Mills said. “One can’t expect to accomplish the task in one leap.’'
Report Confirms Teacher Views
Created in 1988, Vermont’s assessment system measures 4th and 8th graders’ abilities in mathematics and writing in three ways: a uniform test that includes multiple-choice and open-ended questions; a portfolio of student work done in classrooms over the course of a year; and a “best piece’’ culled from the portfolio.
The program was conducted on a pilot basis in the 1990-91 school year cx24p8 el-33land expanded to include most schools in the state the following year. State officials had planned to report the results from the 1991-92 assessment for each supervisory union.
But the officials also asked RAND, which is part of a consortium that makes up the National Center for Research on Evaluation, Standards, and Student Testing, to analyze the project.
In its report, issued last month, RAND found that the reliability was “low enough to limit seriously the uses of the 1992 assessment results,’' and urged the state to report state-level average scores only.
The report suggested that the low levels of reliability may stem from problems in training teachers to score the portfolios, the criteria for evaluating them, and the lack of standardization of tasks.
According to Ms. Rigney, the findings confirmed anecdotal reports from teachers involved in the project.
“We are trying to listen to and act on information from both sources,’' she said.
Training, Scoring Changes
In response to the findings, state officials and teachers over the past month developed a number of changes in the program.
They agreed, for example, to redesign the training sessions to provide more immediate feedback to teachers. Under the previous system, teachers received training during the course of a year, then at the end of the year exchanged their students’ portfolios with other raters, who provided them with feedback on their scoring.
Under the new system, teachers will score portfolios at each training session along with other raters, to inform them immediately about their scoring.
The officials also agreed to select a group of highly qualified raters to conduct the scoring that will be used to generate the state and supervisory-union data.
These teachers--30 to 40 from each grade level for both writing and math--will spend a week, with pay, during the summer evaluating portfolios. The system will be similar to one used in Pittsburgh’s portfolio-assessment program.
Ms. Rigney emphasized, however, that the use of a select group of raters is only an “interim’’ remedy.
“A key feature in this [assessment program is to have every teacher trained and skilled in using the scoring system,’' she said. “The goal is to have every teacher score their own portfolios and aggregate the data.’'
“We’re simply not at that point yet,’' the assessment director said.
Not ‘Off the Shelf’
In addition to revising the training and scoring, officials also agreed to revise the contents of the portfolios, particularly in mathematics, by recommending that a standardized task be included in each portfolio. Such a move, officials said, would provide a check on the scoring by insuring that all raters were evaluating the same task.
Officials also urged teachers not to include puzzles in the math portfolios.
Puzzles--for example, asking students to determine how to time the boiling of a 15-minute egg using a seven-minute and an 11-minute timer--were one of three types of tasks that had been recommended for inclusion. But teachers had found that they were difficult to score.
Daniel M. Koretz, the author of the RAND report, applauded the state for recognizing that adjustments were needed as the assessment program is implemented.
“This is not something you buy off the shelf,’' he said. “It’s a long-term developmental effort.’'