Value-added gauges of teacher effectiveness are highly error-prone and shouldn’t make up more than a minimal part of a teacher’s evaluation, contends a report released this week authored by a high-powered list of academics.
Problems with the gauge include the non-random assignment of students and teachers to classrooms, as well as the fact that value-added can’t distinguish between the contributions of multiple teachers over time and seem to be unstable from year to year.
“Such scores should only be a part of an overall comprehensive evaluation,” the authors wrote. “Some states are now considering plans that would give as much as 50 percent of the weight in teacher evaluation and compensation decisions to scores ... Based on the evidence, we consider this unwise.”
The report comes as the latest salvo in the ever-increasing debate about the appropriate use of the value-added measures in making judgments about teachers’ performance.
Many of the report’s authors are measurement experts: Eva Baker, a co-director of the National Center for Evaluation Standards and Student Testing, at UCLA; Paul Barton, an associate director of the National Assessment of Educational Progress; Edward Haertel, a former president of the National Council of Measurement in Education; Helen Ladd, a professor at Duke University; Robert Linn and Lorrie Shepard, both professors at the University of Colorado; and Richard Shavelson, a former president of the American Educational Research Association, among others.
Some of these academics also contested the Race to the Top guidelines for similar reasons last year.
In this report, the researchers say even though value-added measures purport to take socioeconomic factors into account, other differences can skew the estimates—like inequitable access to health, special services, smaller classes and better resources. “Each of those resource differences may have a small impact on a teacher’s apparent effectiveness, but cumulatively they have greater significance,” the report states.
The report presages ill effects if value-added is made too much a part of evaluation systems. Among other things, the report argues such pressure would narrow the curriculum further; cause teachers to focus more heavily on areas that are likely to improve scores; and make it less likely that teachers would want to work with low-income students with lower test scores.
In interviews, other academics who work on value-added methods said that the report’s caution about the misuse of value-added information is wise. Douglas Harris, a professor at the University of Wisconsin who’s written extensively about value-added, said that it makes sense to use other measures, like classroom observations of teacher practice, to counterbalance or offset test scores so that teachers aren’t unduly pressured to focusing exclusively on topics likely to appear on tests, or to instruct in a way that emphasizes factual recall over higher-order analysis.
But Harris also had some reservations about the report’s conclusions.
“The only reason we can have this debate is that value-added gives us observations of lots of different students, and we can think of them being a sample, and have a confidence interval around the results,” he said. “You can’t even estimate the error with other measures [such as teacher observations], so we’re holding value-added to a higher standard than the other approaches.”
The study of the correlation between teacher observation measures and student achievement is still in its infancy. One of the few studies to make this link, based on Cincinnati’s teacher-evaluation system, did find a difference in teacher effectiveness that was picked up by classroom scorers.
And since most uses of value-added have never been studied in depth, it’s hard to say definitely what effects they’d have on teaching and learning, he added.
Daniel Goldhaber, a professor at the University of Washington, in Bothell, said that the report didn’t note that there are few alternatives for measuring teachers’ influence on instructional outcomes.
“I think people are right to point out the potential flaws of [value-added modeling], but it should be compared against what we have, not some nirvana that doesn’t exist,” he said.
But both researchers took pains to note they don’t endorse the public reporting of teachers’ value-added scores in Los Angeles as part of a controversial Los Angeles Times series.
Some of the states that have won Race to the Top grants, such as Rhode Island and Florida, have committed to basing at least 50 percent of a teacher’s evaluation on student achievement. Those states will have to confront the problem that only a portion of teachers currently instruct in tested grades and subjects, however.