Regular readers know that I’m no great fan of simple-minded value-added systems. As we’ve seen just this week with the L.A. Times value-added brouhaha (which I hope to address in the next couple days), it’s easy for would-be reformers to overreach or oversell (see “Pyrrhic Victories?” for a more extended take).
For the moment, though, let’s set all that aside. Michelle Rhee has been zipping around the country touting value-added metrics and merit pay. While we’re friends, I’ve some differences with Rhee’s unbridled enthusiasm on this question, but none of that justifies the bizarre hatchet job on Rhee that the Washington Post‘s Jay Mathews launched this week in a piece titled “Michelle Rhee’s early test scores uncovered.”
Here’s the deal: Without seemingly having done a lick of, you know, reporting, to check his facts, Mathews breathlessly announced on Monday: “G.F. Brandenburg, a retired D.C. math teacher with an irresistible blog, has done it again. If he had chosen a career in journalism instead of teaching, no U.S. president would have finished out his first term. He has found the missing test score data from former D.C. schools chancellor’s early years as a classroom teacher, something I did not think was possible. He has proved that Rhee’s results weren’t nearly as good as she said they were.” With that lede, Mathews managed to ignite a heated national debate.
Brandenburg is a D.C.-based blogger who specializes in vitriolic screeds, many of them targeting Rhee. The kind of figure whose bombshells a respected journalist would do well to handle warily.
What did Brandenburg “prove” that so excited Mathews?
Brandenburg claimed to have located data that allowed him to follow “four different cohorts of students through Harlem Park Elementary [where Rhee taught], one of the Baltimore City public schools that was taken over by Tesseract/Edison company for several years in the early-to-mid-1990s and failed...I highlighted the classes where Michelle Rhee was teaching. In her last year, the scores did rise some, but nowhere near what she claimed. In her first year, they dropped almost as low as they can go. If Tesseract/Edison had been using the IMPACT evaluation system she foisted on DCPS teachers, she would have probably been fired after the first year!”
Mathews’ WaPo post, which took Brandenburg’s claims at face value, concluded: “Now we know how Rhee’s kids did. Their scores went up, it appears, but not that much.” Well, not quite. Turns out that Brandenburg’s results don’t tell us much of anything, and they certainly can’t speak to the efficacy of any particular teacher at Harlem Park. For those interested in reading the source material for themselves, check out the report “The UMBC Evaluation of the Tesseract Program in Baltimore City,” published in 1995 and written by Lois C. Williams and Lawrence E. Leak.
There are four huge problems with the data and the findings, any of which would have been sufficient to render the results meaningless.
First, the reported results weren’t actually for Rhee’s students. They were test results for a third grade cohort, for which Rhee was one of several assigned teachers. Rhee apparently team-taught second- and third-graders, and was one of four third grade teachers. No individual results, positive or negative, can be fairly linked to Rhee.
Second, the tests were apparently not administered to all students in the school (remember, this was well before the NCLB era), and it’s not clear which students might have been included or omitted. Moreover, it appears that no clear teacher-of-record information is included in the reported data, so it appears impossible to even determine what portion of the tested kids were taught by Rhee.
Third, given the grief Rhee took from those claiming that DC’s sophisticated, pioneering value-added approach wasn’t nuanced enough, it’s laughable to see Brandenburg using crude cohort comparisons (e.g. essentially eyeballing end-of-grade performance from different years) to judge student gains. Even those who may wonder whether Rhee has placed too much weight on value-added metrics ought to acknowledge that she has been unequivocal in championing value-added measures that fairly reflect what individual teachers bring to the table. This makes it doubly bizarre to see her slammed using a calculation that neither Rhee nor any reputable scholar would think a fair or meaningful way to evaluate teachers.
Finally, the study’s authors report substantial unexplained fluctuations in the tested population--a huge problem if one is going to attempt to compare a cohort’s scores in one year to that age group’s performance the next. In 1993-94, Harlem Park had test scores reported for 376 of the 493 enrolled students (or 76%). The next year, Harlem Park had scores for 280 out of 440 students (64%). That kind of attrition can easily play havoc with any results. Brandenburg implies that the attrition is due to nefarious behavior, but provides no evidence. Moreover, the researchers note in the study that they excluded some students who were enrolled, and it’s unclear from the report whether this is due to special needs status, mobility, absence on testing day, or what-have-you. Might’ve been nice if Mathews had tried to contact the report authors and ask about this, no?
There’s no need to wade into questions of whether executives, coaches, or managers can only use metrics in good conscience if they themselves excelled by them--since there’s simply no way with these data to say anything, good or bad, about Rhee’s teaching performance. I don’t want to be unfair here--after-all, it wasn’t until a later post that Brandenburg confessed, “My understanding of statistics is of an entirely elementary nature.” Still, it’s a useful reminder that a Washington Post reporter would be well-advised to check the reliability and validity of an accusation like this before running with it.
If the point is that using value-added is complex and fraught with dangers, I’m totally sympathetic. But skeptics would do far better to argue the merits and offer measured critiques than to engage in dubious attacks. Those concerned about the unfairness of carefully-designed value-added systems like DC’s don’t do themselves any favors by suggesting they can judge Rhee’s own classroom performance using massively inferior data.