Accountability Opinion

The Heart of Our Testing Disagreement

By Deborah Meier — March 14, 2013 4 min read

Updated to correct errors on March 15.

Dear Eric,

I think you you’ve picked the heart of our disagreement, or maybe I should say the “brain” since heart seems wishy-washy!

I might never have changed my views if I hadn’t had the experiences I did—from going into the classroom to having a son who had trouble with testing, and had it not been for those interviews I did of 2nd and 3rd graders. With those experiences behind me, and having had access—as teachers no longer do—to the manuals that used to go with the exams to explain them to teachers, and to some ETS test experts who agreed with me, and all the subsequent work I’ve engaged in, I was forced to change my mind.

My prejudices in favor of standardized tests included my own high test scores and the fact that it went along—historically—with the progressive education movement. Maybe more so its “administrative” wing, but I think even Dewey saw them as a potentially useful ally to progressive schooling. I need to look into that. Maybe you know the answer?

It’s not “ultimately” that we simply differ on the “weights” given.

1. Whether scores rise or fall wouldn’t influence me because I think they are phony measures. If I thought we could close the gap so that the rank order didn’t correspond to race or class I’d buy it just to end the conversation about race, class and intelligence. But, at some point they’re sure to up the scale—creating more differentiation at the top end and thus recreating the same rank order. Re. the market economy and workplace: I can’t imagine what kind of evidence you have in mind except that those who fall out earlier along the K-Ph.D. process are more likely to be unemployed and more likely to earn less money—and more likely to score poorly. Surely that’s a correlation I can explain without recourse to scores.

2. Yes, they also lead to a narrowing of curriculum and—in the process of our definition of being well-educated—overemphasize questions that have clear right/wrong answers. When such answers are not clear, bias begins to weigh in because interpretation is required. And it’s precisely when uncertainty seeps into questions that the highest form of education—starting with 4-year-olds—takes place.

3. Can this narrowing be overcome by better tests? Not too likely, and if it can, it’s by turning good kinds of questions into formulaic rubrics! Back to where we started.

4. The pool of items you describe might be fun for teachers to occasionally use, with a change every year so that they could be public. Then, eliminate high stakes. Maybe such a tool would be useful and could even be use to help kids see where there are huge gaps, etc. in their knowledge base. I always enjoyed giving some tests like that—population of the United States, of New York City, percentage of black Americans, etc.—pop quizzes to see how kids read the larger world. Amazing results, to explore another time.

We use a jury or panel of interviewers in assessment processes for graduation from Mission Hill (K-8), for example. Aspiring graduates present a portfolio of their work in a specific “discipline,” and then present one of these to a panel of five and proceed to “defend” their work. Then the panel votes (fail, needs minor revisions, pass). Students who pass may return for an honor’s presentation. The “needs” category is for stuff like incomplete citations and grammatical/spelling errors, which do not require the full committee reassembling. We do something like this—as prepping for this test—all the way from kindergarten through 8th grade, and in the case of the high school, through 11th. We also supplement these assessments with some more familiar tests: a timeline in history, a geography quiz, a quickie arithmetic test in math, etc. All these could be retaken, and frequently were. And a few students spent an extra half or full year with us, while also taking some high school or college courses. We were in no hurry for them to leave, but we—the faculty and families—counted on our reliability at assessing “readiness.” (One of the “tested” areas related to job-like skills and apprenticeship-like experiences they had: deadlines, asking for help, taking initiative, working among adults, letters of reference, resumes, etc.

Do schools today get data on the amount of “measurement error” when reporting scores, or is that reserved for election predictions? Reliability is incredibly large on questionable on standardized tests while the stakes are very precise. The shift from “percentile” scaled scores (and then scores made equivalent to grade levels) was easier to understand—if one tried. Current “norming” is based on something more elusive.

In short, scores used to have nothing to do with “should,” but were just reporting one’s standing in relationship to others. That more or less required them to be re-normed every few years in case all kids were getting better at the test. (We knew, in New York City, that any time we changed the test-maker, scores would go down. Scores also went down between the end of the old 6th grade and the first year of junior high. Both having nothing to do, I would contend, with the skills being tested.)

When we don’t know someone’s score, Eric, how do you and I ordinarily assess their intelligence or expertise? Probably in ways quite similar to Mission Hill’s graduation exercises and not at all like the way schools presently “assess” their students: by how they can use their intelligence.


The opinions expressed in Bridging Differences are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.