Assessment Opinion

Mixing Politics and Testing

By Deborah Meier — September 20, 2007 4 min read

Dear Diane,

You suggest I needn’t worry about annoying those “with more power”. But I felt badly recently when (as I mentioned) somebody took after Mission Hill school as a way to attack me on another issue altogether. So they can “touch me"—but not stop me! Alas, my travels remind me that others have less wiggle room—even for saying what’s on their minds.

You are right, our disagreements seem to lie in at least two places: (1) the role of standardized tests, and (2) national curriculum vs. local ones. Even in these two areas our views overlap considerably. Let’s start with standardized tests.

When confronted by a disagreement between the judgment of a test vs. that of someone who knows a child well, I tend to rely on the latter. That’s based on the experience of sitting down with kids and going through items one by one. I was shocked. Their wrong answers were often a sign of intelligence, possibly misapplied, not a sign of reading difficulties. (Read Chapter 6 of “In Schools We Trust”.)

It’s also based, however, on the arguments made by Steve Koss, in the Sept. 5th piece you sent me (“Tell me the results you want, and I’ll find a way to make numbers show it.”), who quotes test experts regarding the way these tests are designed. Since few understand p-values et al parents and laymen often believe that the scores rest on unquestionable assumptions.

But today we face a third dilemma. Standardized tests are no longer scored on a so-called national normal curve, but scored “politically.” This is not an accusation but a fact. The level of item difficulty as well as scores rest on the judgment of an appointed group of people. This leads, for example, to scoring a 6th grader on how many right answers such “experts” believe he/she “should” or “could ideally” answer right rather than (as in the old days) scoring the test based on how many the median 6th grader does answer correctly—the center of the “normal curve” of scores. This led in the old days to a predictable curve. Today it leads to anything we want.

Opinion-wise—or otherwise—has replaced statistics. That’s all I mean by “political” norming. At its worst it leads to the results the New York Daily News exposed with regard to N.Y. state’s math test having been made easier this last year. It accounts for test scores all over the country that bear no resemblance to each other as different authorities make different decisions about how hard a test should be and how many right answers are needed. All test-makers know ahead of time how easy or hard it is to get the predetermined right answers for every single item. The selection of items decides the scores. The final scores are no surprise to them; if they were the test-makers would be incompetent.

The old form of norming is what has made such tests seem incorruptible and scientific. It led to other flaws (such as the fact that scores could never—en masse—go up or down for long—without renorming). The new system is living off the reputation of the old style tests, while in fact it rests on a much messier form of human judgment.

I like that messy form, but want to reserve it for when it’s a sensible response. The kind done by individual teachers (grades on papers and exams done for a class, etc.), or by panels of experts of various sorts looking carefully at individual performances both acknowledge to being less and more than “scientific”. The final “score” acknowledges this by being “signed off” on by real-live names (to whom one can appeal). Which is why the criteria for assessment is public and transparent. Such approaches (even PhDs) are much closer to the way we go about making decisions in “real” life; they respect the power of human reasoning in ways that standardized tests—normed or otherwise—cannot. But each has its flaws.

Some combination of these forms, including the judicious, occasional use of standardized norm-referenced tools (or a third variant we have yet to invent) could serve us well, with none having absolute power over the other, while all together inform our judgment. The argument for a more nuanced system of assessing young people shouldn’t seem like anti-testing dogma.

How far apart are we on this, Diane?

Why do I fear lists of state-imposed required songs, historic dates, book titles, etc.? You have more faith than I do in our capacity to keep these “common” lists short. My experience suggests nothing ever gets taken out and much gets added in. I suspect we also disagree on how hard some things are to “learn"—and maybe what we mean by that. Besides I have an appalling rote memory, so memorizing the Gettysburg Address would require eliminating a lot more from the curriculum than it might for you. And finally, we may disagree about how likely they are to be used well vs. abused badly.

But there’s no question that the folks making decisions for us now, at least in NYC, neither know the kids well, know much about what is easy or hard to learn, have never studied psychometrics, and haven’t thought long and hard about the trade-offs involved in trying to create a form of education that serves democracy well. And they are not embarrassed about it.


The opinions expressed in Bridging Differences are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.