Find your next job fast at the Jan. 28 Virtual Career Fair. Register now.
Teaching Profession Commentary

Value-Added: It’s Not Perfect, But It Makes Sense

By Steven Glazerman, Dan Goldhaber, Susanna Loeb, Douglas Staiger, Stephen Raudenbush & Grover Whitehurst — December 15, 2010 8 min read

The vast majority of school districts presently employ teacher-evaluation systems that result in nearly all teachers’ receiving the same (top) rating. For instance, a recent study of 12 districts in four states by the New Teacher Project revealed that more than 99 percent of teachers in districts using binary ratings were rated satisfactory, while 94 percent received one of the top two ratings in districts using a broader range of ratings. As U.S. Secretary of Education Arne Duncan put it during his bus tour this fall, “Today in our country, 99 percent of our teachers are above average.”

The reality is far different from what the evaluation systems suggest. We know from a large body of empirical research that teachers differ dramatically from one another in effectiveness. That today’s evaluation systems fail to recognize these differences means that the many important human-resource decisions are not as efficient or fair as they could be if they incorporated data that meaningfully differentiated among teachers.

See Also

For an opposing view on value-added measurement, see “Public Displays of Teacher Effectiveness,” (December 15, 2010).

Newer teacher-evaluation systems seek to incorporate information about individual teachers based on value-added measures of a teacher’s contribution toward student achievement. The teacher’s contribution can be estimated in a variety of ways, but typically entails some variant of subtracting the achievement-test scores of a teacher’s students at the beginning of the year from their scores at the end of the year, and making statistical adjustments to account for differences in student learning that might result from student background or schoolwide factors outside the teacher’s control. These adjusted gains in student achievement are compared across teachers.

Researchers have pointed out that value-added estimates for individual teachers fluctuate from year to year and can be influenced by factors over which the teacher has no control. The technical issues that have been raised about value-added measures would arise in one form or another with respect to any evaluation of complex human behavior. We believe the correct response to these concerns is to improve value-added measures continually and use them wisely. We should not discard or ignore the information they contain. With that goal in mind, we address four frequently cited concerns about the value-added evaluation of teachers.

• The Use of Value-Added Information

Much of the controversy surrounding teacher-performance measures that incorporate value-added is based on fears about how the information will be used. After all, once administrators have ready access to a quantitative performance measure, they can use it for such sensitive human-resource decisions as teacher pay, promotion, and layoffs. Administrators may or may not do this wisely or well, and it is reasonable for those who will be affected to express concerns.

Rather than asking value-added to measure up to an arbitrary standard of perfection, it would be productive to ask how it performs compared to classification based on other forms of available information on teachers.

We believe that whenever human-resource actions are based on evaluations of teachers, they will benefit from incorporating all the available information that improves prediction of student outcomes, which includes value-added measures. Full-throated debate and research on policies such as merit pay and “last in, first out” layoffs should continue, but we should not let controversy over the uses of teacher-evaluation information stand in the way of developing and improving measures of teacher performance.

• Trading Classification Errors to Benefit Students

The common thread in technical critiques of value-added evaluation is that teachers subjected to it will often be misclassified, e.g., a teacher who is identified as “ineffective” is, in fact, “average.” Given the typical reliability of value-added measures, there is no doubt that such misclassifications will occur with some frequency. However, we must recognize that all decision making systems generate classification errors, including those used today. Moreover, different types of errors have different consequences.

In the case of teacher value-added, the focus has been almost entirely on so-called false-negative errors, i.e., teachers who are falsely classified as ineffective because the measures are not perfectly reliable. But framing the problem in terms of false negatives places the focus almost entirely on the interests of the teacher who is being evaluated rather than the students who are being served.

In the simplest of scenarios involving tenure of novice teachers, it is in the best interest of students to have a high bar set for effectiveness, thereby increasing the proportion of truly effective teachers staffing classrooms (i.e., fewer false positives); by contrast, it is in the best interest of novice teachers to have a low bar set for effectiveness, thereby making it more likely that they will be granted tenure (i.e., fewer false negatives). The administrator must trade off one type of classification error for the other when deciding how high to set the cut score for effectiveness based on teacher-evaluation scores.

We believe that the concern with the effects of misclassification on teachers should be balanced by a concern with the effects on students.

• The Setting of Realistic Value-Added Benchmarks

The correlation of value-added measures of teaching effectiveness between one school year and the next lies between .20 and .60 across multiple studies, with most estimates lying between .30 and .40. A measure that has a correlation of .35 from one year to the next will result in a significant number of classification errors, consistent with our previous point. But is the amount of error in classification too high to be tolerated?

It is instructive to look at other sectors of the economy as a gauge for judging whether value-added measures are sufficiently stable to be used for high-stakes decisions. In health care, patient volume and patient-mortality rates for surgeons and hospitals are publicly reported on an annual basis by private organizations and federal agencies and have been formally approved as quality measures by national organizations. Yet patient volume is only modestly correlated with patient outcomes, and the year-to-year correlations in patient-mortality rates are well below .5 for most medical and surgical conditions. Nevertheless, these measures are used by patients and health-care purchasers to select providers because they are able to predict larger differences across medical providers in patient outcomes than other available measures are.

In a similar vein, the volume of home sales for real estate agents, returns on investment funds, college-entrance examinations, productivity of field-service personnel for utility companies, output of sewing-machine operators, and baseball batting averages predict future performance only modestly. A meta-analysis of 22 studies of objective performance measures found that the year-to-year correlations in high-complexity jobs ranged from .33 to .40, consistent with value-added correlations for teachers.

Despite these modest predictive relationships, real estate firms rationally try to recruit last year’s volume leader from a competing firm; investors understandably prefer investment firms with above-average returns in a previous year; colleges select students with higher entrance-exam scores; and baseball batting averages in a given year have large effects on player contracts. The between-season correlation in batting averages for professional baseball players is .36. Ask any manager of a baseball team whether a player’s batting average from the previous year is relevant in making decisions about the present year.

We should not set unrealistic expectations for the reliability or stability of value-added analysis. Value-added evaluations are as reliable as those used for high-stakes decisions in many other fields.

• The Reliability of Value-Added as a Measurement of Effectiveness

We know a good deal about how other means of classification of teachers perform vs. value-added. Rather than asking value-added to measure up to an arbitrary standard of perfection, it would be productive to ask how it performs compared to classification based on other forms of available information on teachers.

Here the research is quite clear: If student test achievement is the desired outcome, value-added is superior to other existing methods of classifying teachers. Classification that relies on other measurable characteristics of teachers (e.g., scores on licensing tests, routes into teaching, the path to certification, National Board for Professional Teaching Standards certification, teaching experience, quality of undergraduate institution, relevance of undergraduate coursework, extent and nature of professional development), considered singly or in aggregate, is not in the same league in predicting future performance as evaluation based on value-added.

We have a lot to learn about how to improve the reliability of value-added indicators and other sources of information on teacher effectiveness, as well as how to build useful personnel policies around such information. However, too much of the debate about value-added assessment of teacher effectiveness has proceeded without consideration of the alternatives and by conflating objectionable personnel policies with value-added information itself.

When teacher evaluation that incorporates value-added data is compared against an abstract ideal, it can easily be found wanting in that it provides only a fuzzy signal of teacher effectiveness. But when it is compared to performance assessment in other fields or to evaluations of teachers based on other sources of information, it becomes obvious that even a fuzzy signal of teacher effectiveness, if it is the best available signal, can be a vast improvement over no signal.

Teachers differ dramatically in their performance, with large consequences for students. Staffing policies that ignore this reality lose one of the strongest levers for lifting the performance of schools and students. That is why there is great interest in establishing teacher-evaluation systems that meaningfully differentiate performance.

Teaching is a complex task, and value-added captures only a portion of the impact of differences in teacher effectiveness. Thus, high-stakes decisions based on value-added measures of teacher performance will be imperfect. We do not advocate using value-added measures alone when making decisions about hiring, firing, tenure, compensation, placement, or teacher development, but surely value-added information ought to be in the mix given the empirical evidence that it predicts more about what students will learn from the teachers to whom they are assigned than any other source of information.

A version of this article appeared in the January 12, 2011 edition of Education Week as Value-Added: It’s Not Perfect, But It Makes Sense


This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
School & District Management Webinar
Branding Matters. Learn From the Pros Why and How
Learn directly from the pros why K-12 branding and marketing matters, and how to do it effectively.
Content provided by EdWeek Top School Jobs
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
School & District Management Webinar
How to Make Learning More Interactive From Anywhere
Join experts from Samsung and Boxlight to learn how to make learning more interactive from anywhere.
Content provided by Samsung
Teaching Live Online Discussion A Seat at the Table With Education Week: How Educators Can Respond to a Post-Truth Era
How do educators break through the noise of disinformation to teach lessons grounded in objective truth? Join to find out.

EdWeek Top School Jobs

BASE Program Site Director
Thornton, CO, US
Adams 12 Five Star Schools
Director of Information Technology
Montpelier, Vermont
Washington Central UUSD
Great Oaks AmeriCorps Fellow August 2021 - June 2022
New York City, New York (US)
Great Oaks Charter Schools
Director of Athletics
Farmington, Connecticut
Farmington Public Schools

Read Next

Teaching Profession After a Stillbirth, This Teacher Was Denied Paid Leave for Recovery. Here's Her Story
A District of Columbia teacher delivered a stillborn baby and was denied paid maternity leave. Her story, told here, is not uncommon.
6 min read
Illustration of a woman.
Teaching Profession Opinion What Your Students Will Remember About You
The best teachers care about students unconditionally but, at the same time, ask them to do things they can’t yet do.
2 min read
Images shows a stylized artistic landscape with soothing colors.
Teaching Profession High Risk for COVID-19 and Forced Back to Class: One Teacher's Story
One theater teacher in Austin has a serious heart condition and cancer, but was denied the ability to work remotely. Here is her story.
9 min read
Austin High School musical theater teacher and instructional coach Annie Dragoo has three underlying health conditions noted by the CDC as being high-risk for coronavirus complications, but was denied a waiver to continue working from home in 2021.
Austin High School musical theater teacher and instructional coach Annie Dragoo has three underlying health conditions noted by the CDC as being high-risk for coronavirus complications, but was denied a waiver to continue working from home in 2021.
Julia Robinson for Education Week
Teaching Profession Photos What Education Looked Like in 2020
A visual recap of K-12 education in 2020 across the United States.
1 min read
On Sept. 24, 2020, distance learners are seen on a laptop held by teacher Kristen Giuliano who assists student Jane Wood, 11, in a seventh-grade social studies class at Dodd Middle School in Cheshire, Conn. Many schools around the state have closed temporarily during the school year because of students or staff testing positive for COVID-19. Within the first week of November 2020, nearly 700 students and more than 300 school staff around Connecticut tested positive, according to the state Department of Public Health.
Teacher Kristen Giuliano assists Jane Wood, 11, during a 7th grade social studies class in September at Dodd Middle School in Cheshire, Conn., while other students join the class remotely from home.
Dave Zajac/Record-Journal via AP