Teacher Evaluation: An Issue Overview
Teacher evaluations matter a lot—both to teachers and to those holding them accountable. But how can schools measure the performance of all teachers fairly? And what should they do with the results?
In general, teacher evaluation refers to the formal process a school uses to review and rate teachers’ performance and effectiveness in the classroom. Ideally, the findings from these evaluations are used to provide feedback to teachers and guide their professional development.
While governed by state laws, teacher-evaluation systems are generally designed and operated at the district level, and they vary widely in their details and requirements. Traditionally, teacher evaluation systems relied heavily on classroom observations conducted by principals or other school administrators, sometimes with the help of rubrics or checklists. Samples of students’ work, teachers’ records and lesson plans, and other relevant factors were also often taken into account.
But many evaluation systems have undergone significant changes in recent years. Indeed, by the end of the 2000s, teacher evaluation, long an ignored and obscure policy element, had become one of the most prominent and contentious topics in K-12 education.
Jump to a Section
That surprise reversal can be attributed to at least four factors: a wave of new research on teacher quality, philanthropic interest in boosting teacher effectiveness, efforts by advocacy groups and policymakers to revamp state laws on evaluation, and political pressure to dismiss poorly performing teachers.
All that momentum aside, the results of recent changes to teacher-evaluation systems are, as yet, difficult to quantify. Most of the new data show that a great majority of teachers score just as highly on the new evaluations as they did on the previous ones, and it is unclear whether the reforms have systematically—or broadly—led to teachers to receiving better feedback that is translating to better teaching.
Why has teacher-performance evaluation become such a central education issue?
Beginning in the 1990s and through the 2000s, analyses of year-to-year student-test data consistently showed that some teachers helped their students learn significantly more than did other teachers. One widely cited paper, by Stanford University economist Eric A. Hanushek, estimated that the top-performing teachers helped students gain more than a grade’s worth of learning; students taught by the worst achieved just half a year of learning.
Advocacy groups argued that current quality-control systems for teachers were ineffectual. In an influential 2009 report, TNTP (formerly the New Teacher Project), found that more than 99 percent of teachers in the 12 districts it studied were ranked satisfactory on evaluations and that the firing of tenured teachers almost never occurred. The analysis suggested that most of the reviews were perfunctory, and did not distinguish between skilled and low-performing teachers.
Traditionally, teacher evaluation systems relied heavily on classroom observations conducted by principals or other school administrators. But many systems have undergone significant changes in recent years.
For some advocates, such findings opened an opportunity to strengthen the profession. Revamping teacher evaluation, they argued, would help to give teachers better information on strengths and weaknesses and help districts tailor ongoing supports. Some policymakers, though, focused more closely on the prospect of identifying and removing bad teachers quickly and efficiently.
Federal intervention gave muscle to the focus on teacher evaluations. Using $4.3 billion provided through the 2009 American Reinvestment and Recovery Act, the U.S. Department of Education began the Race to the Top competition, offering grants to states that agreed to make certain policy changes. Among the prescribed changes was the requirement to develop and implement new teacher-evaluation systems that differentiated among at least three levels of performance and took student achievement into account.
Major philanthropies also helped to fuel activity around teacher evaluation. The Bill & Melinda Gates Foundation, for instance, spent some $700 million on teacher-quality initiatives alone, much of it on attempts to set up improved teacher-evaluation systems in a handful of school districts.
Prodded by those incentives, states rushed to rewrite laws governing teacher evaluation.
By 2013, 28 states had moved to require teachers to be evaluated annually, up from 15 in 2009, and 41 states required consideration of student-achievement data, up from 15 in 2009, according to one tally. (Because teacher evaluation remains a state and local priority, all of the policies are drafted at those levels. District collective bargaining agreements can add additional nuances. Consequently, what constitutes, say, a “proficient” teacher in one state may not be the same as in other states, or in the district next door, for that matter.)
What constitutes, say, a “proficient” teacher in one state may not be the same as in other states, or in the district next door, for that matter.
As legislators overhauled the systems, some states also took steps to connect the new evaluation systems to other policies, including teacher compensation, promotion, and dismissal.
A 2010 Colorado law, for instance, permits schools to return tenured teachers who receive several poor evaluations to probationary status. Florida’s law requires districts to pay more to teachers who score well on the state’s new evaluations. Rhode Island prohibits a student from being instructed for two consecutive years by a teacher deemed “ineffective.” In other states, evaluation results can be used as evidence for dismissing a tenured teacher for poor performance.
How do the new teacher-evaluation systems work?
The new evaluation systems are far more complex than previously used checklists. They consist of several components, each scored individually. Most of them heavily weigh periodic observations of teachers keyed to teaching standards, such as the well-known Framework for Teaching developed by consultant Charlotte Danielson. Districts and states differ in how frequently they require teachers to be observed, whether the observations must be announced beforehand, and who conducts them.
Policymakers also sought more objective measures in the system because of concerns that personal relationships made it more difficult for principals to grade them accurately. The inclusion of student test scores was a requirement under the federal initiatives, for example.
The most sophisticated approach uses a statistical technique known as a value-added model, which attempts to filter out sources of bias in the test-score growth so as to arrive at an estimate of how much each teacher contributed to student learning. Critics of the approach point to studies showing that the estimates are, in the words of one U.S. Department of Education publication, “subject to a considerable degree of random error.” (States without the capacity to use value-added have adopted simpler—and potentially even more problematic—growth measures.)
States and districts use a predetermined weighting formula to compile results from the components and arrive at a teacher’s final score. Many states initially based half of each teacher’s review on student achievement, but some have scaled back that proportion since.
How have teachers' unions responded to new evaluations?
By 2011, the governing bodies of both the National Education Association and the American Federation of Teachers had issued new policy statements on teacher evaluation. In general, the teachers’ unions highlighted the potential of better evaluations to provide valuable feedback on teachers’ skills. But they remain wary about connecting the systems to teacher pay and tenure, and adamantly oppose the inclusion of students’ standardized-test scores in the systems.
In challenging the use of value-added models as part of evaluation systems, the teachers’ unions cite concerns about the volatility of test scores in the systems, the fact that some teachers have far more students with special needs or challenging home circumstances than others, and the potential for teachers facing performance pressure to warp instruction in unproductive ways, such as via “test prep.”
They also argue that it is unfair for teachers in nontested subjects to be judged by the scores of students they don’t even teach, as some states’ evaluation systems require. Concerns over the use of test scores in evaluations have fueled more than a dozen lawsuits targeting the new evaluation systems.
The pressure to use students’ standardized-test scores has also contributed to a recent wave of anti-testing sentiment, including the “opt out” movement. And indeed, standardized testing appears to have become more frequent as a result of evaluation pressures. Because only about 15 percent to 30 percent of teachers instruct in grades and subjects in which standardized-test-score data are available, some states and districts have devised or added additional tests.
The new evaluations were also rolled out alongside the Common Core State Standards and related exams, leaving teachers concerned about how the harder tests will affect their performance evaluations in the future. As a result of such concerns, some states, with federal approval, have pushed back the dates for attaching consequences to the reviews.
Is there evidence that new teacher-evaluation strategies are working?
The teachers’ unions also frequently view teacher evaluation as part of a concurrent trend of outright attacks on educators livelihood. Lawmakers, mainly Republicans, have made progress in scaling back collective bargaining rights, “fair share” fee arrangements, and automatic deduction of dues from members' paychecks. But Democrats, typically champions of labor priorities, have been among the supporters of the new teacher-evaluation systems.
For all the energy spent on putting the new systems into place, the dividends paid by the them aren’t yet clear. A few studies do show some preliminary evidence that teachers who receive high-quality feedback subsequently go on to boost student performance. One study on the District of Columbia’s IMPACT teacher-evaluation system found that teachers on the cusp of dismissal, or of receiving a bonus, generally went on to pull up their evaluation scores the following year.
Many of the states’ new systems continue to be in a process of testing and refinement, with their scoring mechanisms facing challenges both from those who think they are too lenient or incompletely implemented and from those who feel they are unfair or counterproductive. For that reason, teacher evaluation is likely remain a contentious and central topic in K-12 education.
Terms to Know
Collective Bargaining: The process by which a district and a union representing teachers arrive at a contract spelling out work hours and conditions, salary, benefits, and processes for handling grievances. Often, contracts also set out details on professional development and other school initiatives, or supplement state law governing teachers. Contracts are legally binding.
“Last In, First Out” (LIFO): Many states and districts use seniority in making layoff decisions, despite pressure from some advocacy groups to base those decisions on performance, instead. Often, this process is referred to as “last in, first out.”
Teacher Observations: Most teacher-evaluation systems require teachers to be observed several times. State and local policies determine such details as the length of the observations, the mix of formal and informal visits, whether they must be accompanied by pre- or post-observation conferences, and who conducts them. Though generally principals and administrators are responsible for teacher evaluation, some districts include other teachers and even independent consultants or “validators.”
Teacher Tenure: When a teacher has completed his or her state’s probationary period successfully, he or she receives career status, sometimes known as tenure. (Most states have probationary periods of three years.) In general, tenured teachers can be fired only for a reason listed in state law. Districts must prove that they have met this standard during a due-process hearing. Due-process procedures typically differ based on whether the charges deal with misconduct or poor performance.
Value-Added Model (VAM): In the context of teacher evaluation, value-added modeling is a statistical method of analyzing growth in student-test scores to estimate how much a teacher has contributed to student-achievement growth. In general, VAMs factor in the gains the student was expected to make based on past performance, and in some cases, control for elements such as peer characteristics and background, including poverty level and family education.
Teacher-Evaluation Research and Resources
- “The Widget Effect,” by Daniel Weisberg, Susan Sexton, Jennifer Mulhern, and David Keeling. This report from advocacy group TNTP documents uniformly high teacher-evaluation results and a very low number of teachers being dismissed for performance in the 12 districts studied. (View an Education Week summary.)
- “Measures of Effective Teaching,” project led by Thomas J. Kane. The study commissioned by the Bill & Melinda Gates Foundation examines the technical properties of several different gauges of teaching quality, including their ability to predict students’ test scores. (View an Education Week summary of the final reports.)
- “Error Rates in Measuring Teacher and Student Performance Based on Student Test-Score Gains,” by Peter Z. Schochet and Hanley S. Chiang. This federally financed study examines error rates in value- added measures of teacher effectiveness, concluding that misclassifications could be as high as 25 percent to 35 percent depending on the number of years of data used.
Education Week Resources
- “Tenn. Teachers’ Union Takes Evaluation Fight Into the Courtroom,” by Stephen Sawchuk. Tennessee’s teacher union is among those that have sued over the details of teacher evaluation. March 2014.
- “Teachers’ Ratings Still High, Despite New Measures,” by Stephen Sawchuk. Many revamped teacher-evaluation systems continue to show most teachers getting high marks. February 2013.
- “D.C. Teachers Improved After Overhaul of Evaluations, Pay,” by Stephen Sawchuk. Research indicates that teachers on the cusp of a poor evaluation or a pay bonus improved their performance. October 2013.
- “Contract Yields New Teacher-Evaluation System,” by Stephen Sawchuk. Labor and management came together in New Haven, Conn., to construct and implement a new teacher-evaluation system. November 2011.
- “New Teacher-Evaluation Systems Face Obstacles,” by Stephen Sawchuk. In 2009, there were few good working models on which to base reforms to teacher evaluation. December 2009.