Princeton Study Takes Aim at ‘Value-Added’ Measure

By Debra Viadero — June 23, 2009 2 min read

Merit pay for teachers is an idea that seems to be getting increasingly popular these days with politicians on both sides of the aisle. But if performance-pay plans are going to succeed, they require an evaluation system that is widely seen to be fair and accurate.

A lot of policy makers are pinning their hopes on “value-added” measures of student achievement to fill that bill. The thinking is that value-added models provide a fairer measure of teacher effectiveness because they track students’ year-to-year learning gains, rather than their absolute levels of achievement. That way, teachers are not getting undeserved blame for the learning deficits that students bring with them to the classroom or undue rewards for being blessed with a classroom of high achievers.

A forthcoming study, however, suggests policy makers might want to think twice before embracing value-added measures of teacher effectiveness. In a paper due to be published in February in the Quarterly Journal of Economics, Princeton University economist Jesse Rothstein uses some sophisticated modeling techniques to suggest that such techniques could be based on shaky assumptions.

Using student-testing data from North Carolina, Rothstein makes his case by developing a “falsification” test for value-added models. For example, he wondered, would the model show that 5th grade teachers have effects on their students’ test scores in 3rd and 4th grades? Since it’s impossible for students’ future teachers to cause their previous achievement outcomes, Rothstein reasons, there should be no such effects.

But in fact there were—and they were quite large. Rothstein says this happens because students are not randomly sorted into classrooms. A principal, for example, might assign a large number of students with behavior problems to a teacher who is known to have a way with problem students or parents of high achievers might lobby to get their child in a class with the “best” teacher. When that happens, though, it biases the results of value-added calculations.

If this study sounds familiar, it’s because it’s been circulating a while and gathering lots of buzz. I’m not yet sure why the finding that a 5th grade teacher seems to cause students’ 3rd and 4th grade achievement automatically implies that students were not randomly sorted, but I hope to figure that out. Look for a more detailed story from me soon in Education Week.

In the meantime, you should know about two other studies that offer some counterpoint to Rothstein’s findings. Thomas J. Kane and Douglas O. Staiger, for one, conducted a small experiment in Los Angeles public schools to see if value-added calculations would match the experimental results. They did. See their paper, “Are Teacher-Level Value-Added Estimates Biased?: An Experimental Validation of Non-Experimental Estimates.

A second study, a working paper by Cory Koedel and Julian R. Betts, suggests that the kinds of biases that Rothstein highlights in his paper can be overcome with more complex value-added models.

You can find a summary of Rothstein’s paper and the full text on Princeton’s Website, where they were posted yesterday.

A version of this news article first appeared in the Inside School Research blog.