It’s in vogue these days to declare the building blocks of statistical inference irrelevant to assessing the performance of schools. For example, Joel Klein recently argued that statistical significance is “a game.” Yesterday, Kevin Carey argued that accounting for sampling error - the idea that there is statistical uncertainty in measures from a sample rather than the full population - in the context of NCLB is “silly” because “unlike opinion polls, NCLB doesn’t test a sample of students. It tests all students. The only way states can even justify using [margin of errors] in the first place is with the strange assertion that the entire population of a school is a sample, of some larger universe of imaginary children who could have taken the test, theoretically.”
Dan Koretz, Harvard psychologist and author of Measuring Up: What Educational Testing Really Tells Us, provides a very clear explanation of why Carey is wrong:
A few readers might be wondering: if all students in a school (or at least nearly all) are being tested, where does sampling error come into play? After all, in the case of polls, sampling error arises because one has in hand the responses of only a small percentage of the people who will actually vote. This is not the case with most testing programs, which ideally test almost all students in a grade. This question was a matter of debate among members of the profession only a few years ago, but it is now generally agreed that sampling error is indeed a problem even if every student is tested. The reason is the nature of the inference based on scores. If the inference pertaining to each school...were about the particular students in that school at that time, sampling error would not be an issue, because almost all of them were tested. That is, sampling would not be a concern if people were using scores to reach conclusions such as "the fourth-graders who happened to be in this school in 2000 scored higher than the particular group of students who happened to be enrolled in 1999." In practice, however, users of scores rarely care about this. Rather, they are interested in conclusions about the performance of schools. For the inferences, each successive cohort of students enrolling in the school is just another small sample of the students who might possibly enroll, just as the people interviewed for one poll are a small sample of those who might have been. (p. 170)
Addressing complexities like sampling error is not just exploiting a “loophole” to avoid NCLB sanctions. Rather, it’s an assurance that when we label a school as “in need of improvement,” we’re not wrongly assigning that label. It strikes me as deeply ironic that even as NCLB endorses “scientifically-based” research, many wonks continue to turn their noses up at the central conventions of the science of statistics.
The opinions expressed in eduwonkette are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.