14-Nation Student Writing Assessment Called Failure
Chicago--Because judges in different countries could not agree on a common standard for evaluating writing performance, an 11-year, 14-nation project aimed at comparing students' writing performance has failed, researchers said here last week.
"I view this study as an interesting failure, which should make us more modest about writing assessment," the study's director, Alan C. Purves, said at the annual meeting of the American Education Research Association.
Mr. Purves, a professor of education at the State University of New York at Albany, noted that, although raters within each country were consistent in evaluating the students' work, different countries had differing views of what constituted high-quality writing.
Some countries emphasized such characteristics as style and grammar, while others--such as the United States--tended to focus on the content of the writing samples, he said.
Moreover, the study's director noted, researchers were unable to draw conclusions from students' performance on the assessment about their abilities as writers, since their performance differed depending on the task they were asked to complete.
The findings suggest that those advocating the use of performance assessments in writing and other subjects "need to be more honest" about problems in comparing ratings, Mr. Purves said.
"There is a national bandwagon for performance assessment, which is a wonderful idea," he said. "But none of the people I have heard talking about it talk about the rating problem. In performance assessment, you have to deal with raters."
Eva L. Baker, co-director of the federal research center on assessment at the University of California at Los Angeles, who was involved in the study's development, said it is possible to compensate for such problems.
But, she acknowledged, the procedure--statistically calibrating the ratings to a common standard--is difficult and time-consuming.
Ruth Mitchell, associate director of the Council for Basic Education, who is writing a book about performance assessment, said the failure to come up with international comparisons is not surprising.
Unlike in other subjects, she said, a student's performance in writing depends on the task to be accomplished and the audience intended to be reached, and audiences differ in each country.
"I wouldn't have expected anything else," Ms. Mitchell said. "I thought it was extraordinary they were trying to compare" students in different countries.
U.S. Students Performed Well
The international study, conducted by the International Association for the Evaluation of Educational Achievement, was based on a test administered in 1984 to students in 14 countries.
At the time it was conceived in 1980, the study was considered a bold breakthrough in assessment, which up to that point had been for the most part limited to multiple-choice tests.
In the United States, about 1,500 6th graders, 1,500 10th graders, and 1,000 college-bound 12th graders from 400 schools were tested as part of the study.
The participants were asked to write on three topics for one hour each that measured their abilities in descriptive, persuasive, and reflective writing, such as writing a letter of advice or describing meeting a new friend.
The results of the U.S. portion of the study indicated that American students performed relatively well on the assessment, compared with their performance on other tests of writing abilities.
In an analysis of the U.S. results completed in 1987, Ms. Baker said, 86 percent of the American students wrote papers judged "competent" or better. She attributed the relatively high level of performance to the fact that the iea study allowed students considerably more time than other assessments to write their essays. (See Education Week, Sept. 23, 1987.)
But Mr. Purves said last week that additional analyses of the data showed that the longer time period did not appear to help many American students. In the United States, he said, many students turned in their papers in 35 minutes, the same amount of time they typically spend on in-class writing assignments.
"They are psychologically set up to do a 35-minute rough draft," he said.
Moreover, Mr. Purves said, the study found that it is impossible to make judgments about students' overall writing abilities from the data. Not only did students perform differently depending on the task, he noted, the researchers also found that the factors that appeared to relate to performance differed for different tasks.
The results "are not saying X kid is a good writer," Mr. Purves said. "They say, 'This kid did pretty well on this task today."'
Ms. Mitchell pointed out that many state writing assessments, as well as the National Assessment of Educational Progress, have recognized this problem, and have devised their tests to provide separate scoring scales for various modes of writing.
"A problem with writing assessment has been an attempt to get a generic prompt for 'writing,"' she said.
In addition to being unable to draw conclusions about students' 4overall writing abilities, the researchers were unable to compare student performance across national boundaries, Mr. Purves said.
'Perceived Drafting Quality'
Based on each country's culture and curriculum, he said, judges tended to give greater weight to either the content or the style of a writing sample.
In the United States, for example, judges tended to value content more highly. This may be because few students in this country are taught to edit their work, and the raters evaluated the essays accordingly, Mr. Purves suggested.
"A lot of writing assessment is based on pdq--perceived drafting quality, or what the raters see as the quality of the draft," he said. ''It may not have anything to do with the writing performance or skill."
Despite the inability to make comparisons, Mr. Purves noted that the study did produce some findings that could be explored in further research.
For example, he pointed out, although girls tended to outperform boys on all tasks, the gap appeared greatest in countries that emphasized style and grammar.
"In those countries where the perception of raters was guided more by content," he said, "boys and girls come together."
Mr. Purves also suggested that the study was useful in pointing out the limitations of tests to measure writing ability.
"In writing assessment, researchers need to make much more modest claims than they have in the past," the study director said. "I don't think we would have learned this without making this attempt."