Annual student testing has become a mainstay of the U.S. K-12 education system under the federal No Child Left Behind Act, but there remains a sticking point: We haven’t figured out what to do with the results. One of the most hotly contested recent debates is whether to link student test scores to individual teachers, calculate teachers’ apparent contributions to student learning—called “value added”—and use this number as a basis for teacher hiring, tenure, and compensation decisions.
In April, the New York legislature passed a law prohibiting the use of student test scores in decisions concerning teacher tenure. Also recently blocked was a proposal by U.S. Rep. George Miller, D-Calif., to allow “high need” school districts to apply for federal funds that would compensate teachers based partly on student scores. In both cases, efforts to reject high-stakes applications of teacher value-added were led by teachers’ unions.
Why the concern? No Child Left Behind’s testing requirements and sanctions have increased shallow test-prep instruction, counterproductive “gamesmanship” among schools and teachers, and narrowing of the curriculum. Yet the law has had some positive effects: Because of its incentives, schools now focus more on achievement and achievement gaps, and they have better information on which to base curricular and instructional decisions. Depending on how it is used, teacher value-added might amplify both the problems and the benefits of standardized testing.
In response to the increasing interest in teacher value-added, the Wisconsin Center for Education Research, at the University of Wisconsin-Madison, hosted a national conference on value-added modeling in late April. The papers we commissioned were written by distinguished scholars in economics, sociology, educational statistics, and psychometrics. Our goal was to put value-added to the test.
The basic problem with rewarding teachers based on student test scores is that student outcomes are affected by parents and communities as well as schools. It is common sense that educators should be held responsible for what they can control—no more, no less. Therefore, any valid measure of teacher performance has to isolate the role of the teacher from these other factors.
Because of the strong influence of students’ home environments, it was nearly impossible before No Child Left Behind to determine how much schools or teachers contributed to test scores. The shift to annual testing has—potentially—changed all that. By adjusting for where students start at the beginning of each school year, value-added helps account for much of what happens outside schoolhouse walls.
While it is far from perfect, there is good reason to think value-added modeling may fill important voids in the current strategy for improving schools.
In some respects, value-added measures seem to work well. Two economists at the conference presented results of an experiment which found that teacher value-added (measured before the experiment) was a good predictor of student learning for the same teachers when they were randomly assigned to students. Consistent with this finding, another conference paper showed that teacher value-added does not seem to vary much based on the types of students taught. Also, colleagues and I have discovered that teacher value-added scores are positively (though modestly) related to principals’ subjective assessments of teachers, and that this is true across a large number of studies. Clearly, value-added measures offer some useful information.
Problems remain, however. First, value-added models require controversial assumptions—that increasing a student’s test score by one point means the same thing regardless of where the student starts out, for example, and that students are not assigned to teachers based on factors related to their previous achievement. Several conference papers suggested that these assumptions are false. One also showed that different test-scaling methods can lead to very different teacher value-added scores.
Second, value-added measures have some undesirable properties. They are sufficiently imprecise as to make it difficult to clearly distinguish the performance of one teacher from that of another, for example. Partly as a result of this, teachers’ performance appears to vary considerably year to year, even though actual teacher performance is probably much more stable.
Yet, despite their limitations, value-added measures seem to “work”—that is, they contain some useful information about teacher performance. While it is far from perfect, there is good reason to think value-added modeling may fill important voids in the current strategy for improving schools.
But we cannot evaluate value-added in isolation, or solely in statistical terms. It is important to consider the potential uses of value-added measures compared with current uses of student test scores, and whether other options might exist. I discuss three such alternatives below.
It is important to consider the potential uses of value-added measures compared with current uses of student test scores, and whether other options might exist.
The main strategy now for improving teaching in schools is to reward credentials such as experience, certification, or formal education. Value-added, notwithstanding its flaws, is almost surely a more valid indicator of a teacher’s contribution to student test scores than any credential. If we view the objective as raising student test scores, then the test scores themselves will surely provide a better indication of progress toward that objective. This logic is confirmed by the weak statistical linkages between credentials and student test-score gains, and by the evidence cited above.
This doesn’t mean that credentials should be abandoned, however. University-based education, professional development, and mentoring can all play an important role in improving teaching. Indeed, with teacher value-added measures put into place, teachers and administrators may be more likely to make better decisions about which credentials to obtain, thereby making the credentials more useful.
A second alternative is calculating only school value-added. This would certainly be better than the current federal focus on “adequate yearly progress,” which rewards schools mainly for their success in attracting students from advantaged backgrounds, rather than for their contributions to student learning. (The so-called growth models approved by the U.S. Department of Education do little to fix the problem.)
School value-added may also have some advantages over teacher value-added. First, as the evidence on principal evaluations suggests, most educators already know who the high-value-added teachers are in their schools. Pressure from colleagues might be more than enough to drive others to improve. Also, teacher value-added can be calculated for only a small percentage of teachers—those who teach for several consecutive years in tested grades and subjects. So, some type of school value-added may be necessary. The disadvantage would be that school-level data don’t provide much useful information to individual teachers about their own performance.
Finally, we could just give assessment data, including test subscores, to teachers, without calculating teacher value-added. The advantage of this approach would be that it provides useful information to teachers about how they and their students are doing, information specific enough to help teachers improve. Teacher value-added, in contrast, provides only a summative assessment, which is important for creating incentives but insufficient to drive improvement.
Teacher value-added may turn out to be superfluous if these alternatives are adopted, or it may instead become a complementary addition. We simply don’t know enough—not enough to start outlawing reasonable ideas. Instead, state and federal governments should provide funding to encourage voluntary experimentation with programs developed jointly by teachers and administrators. In this respect, Congressman Miller’s approach is better than New York’s. While I wouldn’t advise the use of teacher value-added as the primary basis for teacher tenure, New York’s ban goes too far by prohibiting any use of value-added in these decisions.
The new era of expanded student testing has provided an immense amount of potentially useful information. Let’s find out just how useful it can be in driving genuine school improvement.