I’ve been meaning to do a longer postmortem on Florida’s Senate Bill 6. As I’ve noted before, I enthusiastically supported it even though I thought it a deeply flawed bill. The flaw? Its ham-fisted attempt to strip out one set of anachronistic strictures (governing tenure and step-and-lane pay scales) only to replace it with a set of test-driven processes that were almost equally troubling.
I’ll get to that eventually. But in travels to Boston and Houston yesterday, I had occasion to reflect on the often incautious faith in value-added assessment that underlies many efforts to rethink teacher evaluation and pay. The simple truth: value-added is an imperfect, imprecise, and fundamentally limited way to gauge teacher efficacy. That said, it’s still a helluva lot better than the status quo. If we’re honest and up-front about its limits and use it as tool that should be handled with care, I’m an advocate. The trick is to be openly and proactively critical about tools like value-added (or merit pay or school choice) so that we can design them thoughtfully and anticipate problems.
Unfortunately, too often--as in Senate Bill 6 or in proposals that districts should start to publicize the value-added scores of individual teachers--we are not using value-added carefully. Instead, it is frequently treated as a casual cure-all. Last night, at a terrific all-star Philanthropy Roundtable dinner panel (Andy Rotherham, Kati Haycock, Tim Daly, Alex Johnston, and Houston supe Terry Grier), I found myself getting nervous at how casually strong numbers on grades three to eight reading and math value-added were being treated as de facto determinations of “good” teaching.
There are all kinds of problems with this unqualified presumption. At the most technical level, there are a dozen or more recognized ways to specify value-added calculations. These various models can generate substantially different results, with a third of each result varying with the specifications used. When used for a teacher in a single classroom, we frequently only have 20 or 25 observations (if that). The problem is that the correlation of such results year after year is somewhere in the .25 to .35 range. Due to limited grade coverage, we can generate math and reading value-added for only about 25 to 30% of all teachers. As schools get smarter about rethinking staffing and integrating virtual instruction, it’ll be increasingly difficult to attribute a student’s performance to a single teacher (this is already a thorny question for districts where students receive substantial pull-out instruction or work with a reading coach).
A more fundamental question is how well reading and math assessments reflect the range of knowledge, skills, and habits of mind we want teachers to cultivate. I’m pretty sure they’re a good representation when it comes to serving academically at-risk students who historically haven’t been taught the basics; but I’m much less confident about how accurately they represent what we care about when it comes to students for whom foreign languages, music, or gifted instruction constitute a much larger share of the school day.
Now, we can measure teacher performance in various ways other than value-added scores. Such measures, along the lines that Tom Kane’s shop is piloting at the Gates Foundation, include familiar approaches like observation and more novel efforts like student feedback. Importantly, however, the validity of all of these techniques is being gauged by how tightly they correlate with grades three to eight reading and math scores. In other words, whether these are good measures of student learning is being gauged by how tightly they reflect value-added calculations. So, if our tests are flawed or are not capturing what we really care about, the proxy measures will be flawed or off-key in similar ways.
Again, I don’t think any of this is grounds for shying away from value-added. Using it is far, far better than the alternative. But it is, inevitably, flawed and imperfect. This means several things. One, we should be careful how we use it. Senate Bill 6 would have been much stronger had it knocked out tenure and step-and-lane pay but had given school districts more leeway in designing evaluation and compensation. Two, we should employ measures like using a couple years of data and trying to build complementary tests and assessments. Three, we need not to reflexively trust the assessments we have, but to sensibly and thoughtfully ask how reliable they really are.
And four, we need to do an infinitely better job educating parents and policymakers on the problems with current practices that value-added can address and then flag the pitfalls of such reforms and explain how they can be addressed. It’s going to be hard to win and maintain solid parental and legislative support for value-added systems--much less to win teacher backing--unless proponents can offer credible responses to concerns about inequities, inaccuracies, and distortions.
We have a tendency to fall in love with new solutions and then dismiss skepticism as opposition. That would be a serious mistake.
The opinions expressed in Rick Hess Straight Up are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.