The past couple days I’ve run pieces by PARCC’s Jeff Nellhaus and SBAC’s Joe Wilhoft that helped illuminate how their consortia are going to address some key challenges when it comes to making sure that the new Common Core tests can carry the load they’re being asked to bear. I want to thank Jeff and Joe for their thoughtful, constructive, and illuminating contributions. I found the exchange somewhat heartening, after years during which my questions had been genially (and sometimes not so genially) brushed aside. And, to be clear, this exchange is less about PARCC, SBAC, or the Common Core in particular than about the practical challenges of introducing new, quasi-national, computer-assisted tests that are to be used for high-stakes decisions.
As has been the case all along, my primary concern is to have an open, transparent discussion so that I (and, I guess more importantly, parents, educators, voters, and those making the decisions) are clear how much weight these assessments should bear. This matters because these assessments are being used not merely for informational or instructional purposes, but to label schools, judge programs, evaluate teachers, and determine job security and pay.
Having read the responses by Jeff and Joe, I have a few additional queries. And let’s be clear: the questions that follow aren’t solely (or even mostly) for PARCC or SBAC, but also for the community of state officials, advocates, and educators committed to Common Core implementation. Indeed, these responses make clear that many of the challenges ahead are less questions of test design and psychometrics than of policy and practice. And I’ve long been puzzled that Common Core’s champions have pooh-poohed these questions rather than charging out front to sound the alarm, demand aggressive action, and make sure this stuff actually works. (On the other hand, I would’ve thought Affordable Care Act enthusiasts would’ve done that for Healthcare.gov, and they didn’t. So, who knows?)
1] Jeff notes, “Schools are responsible for creating testing conditions that are consistent with PARCC’s protocol for standardized conditions.” I get that. But what if they don’t? What if some districts are lax on this count, or disorganized, or choose to try to inflate their results (a la Atlanta) by manipulating testing conditions? Who is responsible for policing them? Presumably, the state. But if states do a poor job of this, does PARCC have any authority to act? If not, who is charged with seeing that states and districts abide by the protocol?
2] Joe notes there’s “not much evidence that physical setting” affects scores, and comments that SBAC has developed detailed test administration procedures. Cool. But I guess I wonder whether, as the stakes increase, we won’t see more pressure to explore new opportunities to boost scores via questionable means. And just as state leaders had incentives to game things under NCLB, it seems they’ll have even more reason to countenance that now that their results will be directly comparable. Will procedures and manuals be a strong enough constraint to stop that? After all, in the past, whereas the state’s only interest in monitoring assessment was to police its districts, now state officials will have a vested interest in districts outperforming their peers in other states. Does this create incentives for bad actors? Who will keep an eye on this and make sure that some states/districts aren’t finding ways to exploit advantageous conditions?
3] Joe and Jeff note the importance of gauging the consistency of results across different devices and explain that the field tests are providing information on this score. As Joe writes, SBAC is collecting data to gauge whether different devices “have a differential impact on student performance.” Jeff notes, “If the research shows differences in performance that can be attributed solely to the mode of administration or device students are using, PARCC will need to consider alternatives for reporting the results. One alternative could be to report the results using different scales and establish concordance tables between these scales, much the same as used to compare results of the SAT and ACT.” He notes that PARCC is discussing these questions with its Technical Advisory Committee, but these are issues that will have serious implications for how the results are used to make consequential decisions. Is an ACT-to-SAT approach sufficient to do high-stakes teacher evaluation? What other options are being considered? If the problems are real, does it recommend that states delay putting too much weight on these results? I’m left wondering whether those comparability fixes are strong enough to support high-stakes outcomes--especially on personnel decisions. My lawyer friends tell me that, especially if there are claims of “disparate impact” in which teacher race correlates with test administration, they think such systems could prove legally vulnerable. Should this be a concern for state or district leaders?
4] As far as testing windows, Joe just noted that SBAC states have a 12-week testing window. Jeff explains that the broad national testing window for PARCC is to accommodate the fact that some schools open earlier and later than others, ensuring that tests are administered after approximately 75 and 90 percent of instruction. Fair enough. But even Jeff’s response doesn’t really offer insight into who (if anyone) is going to monitor or police that process. But what’s the plan to stop states or districts from gaming the testing windows? Historically, one of the easiest ways to turbocharge test outcomes is by manipulating the assessment calendar so that your students get more instruction than those in other states/districts. What’s the mechanism for policing against manipulation here? After all, one impetus for the Common Core assessments was the conviction that states were gaming their old NCLB tests. These same state officials will have the same incentives to game the new system. How will we know if they do? What’s to stop them from doing so?
I greatly appreciate PARCC and SBAC taking the time to respond, and hope that this exchange can spur a fruitful public dialogue about just how to ensure that these critical new tests will be able to shoulder the weight that policymakers are asking them to bear. I invite PARCC and SBAC to offer further thoughts in response to these follow-ups, if they’re so inclined. I’m also hereby offering to host a public conversation on these issues if PARCC, SBAC, CCSSO, NGA, Chiefs for Change, or other interested parties are interested in doing so.