Guest post by John Thompson.
A National Bureau of Economic Research (NBER) working paper, National Bureau of Economic Research James Wyckoff’s and Thomas Dee’s “Incentives, Selection, and Teacher Performance,” does precisely what the title suggests. It studies the effect of former Chancellor Michelle Rhee’s Washington D.C. teacher evaluation system, IMPACT, on some indecipherable aspects of teacher performance, but it ignores the question of whether student performance was increased. It shows that IMPACT has had an impact. They offer no evidence, however, that it was positive.
Of course, Wyckoff’s and Dee’s methodology is the antithesis of the professed aims of reformers - who claim we should unflinchingly focus on students’ outcomes, not adult interests. But, the “carrot and stick” D.C. agenda requires some adults to intimidate others. So, economists measured what happens when teachers are told to “jump!” Metrics were developed for showing that teachers lurched.
The only valid conclusions that can be drawn from the study’s methodology were reported by the Washington Post’s Emma Brown and Politico’s Stephanie Simon. Brown explained, “Rewards and punishments embedded in the District’s controversial teacher evaluation program have shaped the school system’s workforce, affecting both retention and performance,” But the report is “silent about whether the incentives have translated into improved student achievement.”
Simon also recounted D.C.'s disappointing results in terms of student performance. She quotes Thomas Dee as saying the concept is that carrots and sticks can have an effect. “This is a proof of concept.”
So, it is not a criticism of Wyckoff and Dee to say they have written a study on behaviorist theory, not something with practical implications for school improvement. Real world, performance evaluations mean that words and numbers are attached to the name of an employee. The evaluations may or may not be valid approximations of the worker’s effectiveness. Those metrics are likely to affect someone’s performance. Whether they make the person more productive or not is another question. Wyckoff and Dee chose to skip that question, however.
Wyckoff and Dee compared teachers whose evaluation scores were close to the dividing lines for being considered a high performer or a low performer. This method could have been the first step in a useful study. There is little actual difference between the so-called “effectiveness” of a teacher that, for instance, is just below or just above the cutoff between being evaluated as “Effective” and “Minimally Effective.” The difference under the D.C. IMPACT system is that the lower-rated teacher’s job is at risk, and thus has a strong incentive for changing his behavior. Such a teacher has more motivation for increasing his evaluation scores.
To my knowledge, nobody has ever doubted that such teachers would play the game and make changes to avoid termination. The question of whether teachers become more effective in teaching, however, is completely separate. Neither do Wyckoff and Dee attempt to address that, more important, issue.
Real world, it should be obvious, under-the-gun teachers have more motivation to precisely follow instructions and teach more directly to the test. Similarly, threatened teachers are more likely to be more obedient when writing lesson plans, articulating the objectives and standards in precisely the right manner. When a teacher’s job is at risk, he will work harder to do what evaluators think is important. More often than not, he will put more care into his “data walls” and “word walls,” and conform to whatever the evaluator sees as the ideal presentation of those silly little details. In other words, at-risk teachers will bite their tongues, toe the line, and become much more compliant in regard to the trappings of the observation process.
In such evaluation regimes, test scores and teachers’ value-added scores should increase. The only way that those metrics, for what they are worth, would not go up is if they prompted educational malpractice that backfired, undermining the actual effectiveness of good teachers, and running them out of the district. Regardless of whether that effort improves or damages teaching and learning, higher scores on evaluations should result.
Because of IMPACT, the behavior of principals, other evaluators and teachers have changed enough to increase teachers’ evaluation scores by about 10 points on a 400 point scale. Perhaps such a small change requires more than just stepping up effort on busy-work - or perhaps not. The more interesting question would be whether teachers increased their “value-added.” Perhaps the most interesting question is why Wyckoff and Dee do not focus on that issue ...
Wyckoff and Dee assume that IMPACT is good because it increased attrition of teachers who they believe are lower-performing. They make a big deal about teachers who are ranked lower leaving the system, because “less effective teachers under the threat of dismissal are more likely to voluntarily leave.” That would be a big deal if they had evidence that those who left were actually lower-performing. But, effective teachers who are wrongly indicted at “minimally effective” are also more likely to say “take this job and shove it,” and leave.
It is safe to assume that some low-rated teachers were fairly evaluated and are actually less effective in the classroom and some good teachers were misidentified. Wyckoff and Dee have no clue who was correctly or incorrectly identified as low-performing. After all, IMPACT and other systems that use value-added are systematically biased against classrooms with larger numbers of English Language Learners, students on special educations IEPs, and low-income students. It stands to reason that effective teachers who are “false positives,” meaning that they were inaccurately categorized, will behave like their colleagues who are not effective. Both will leave the D.C. schools.
What do you think? Why did the D.C. schools not seek a fair and effective method of removing teachers who don’t do their jobs? Why did it impose this corrupting “carrots and sticks” system on everyone in order to drive off a few? And, why did the researchers focus on metrics regarding teacher behavior and not student performance?
The opinions expressed in Living in Dialogue are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.