The Alphabet Soup of Scores
I grew up believing numbers were true. "Math is beautiful," my uncle would say. "With all the uncertainties of life, math provides a place where there is such a thing as one right answer." I didn't grow up to be a math teacher like my uncle--I teach English--but I understood what he meant. Recent months, however, have shaken my childhood understanding of numbers-as-truth as I have learned about changes in scoring for both the Maine Educational Assessment, my state's achievement barometer, and the national college-entrance standard, the Scholastic Assessment Test.
In March, my 11th graders took the Maine Educational Assessment, or M.E.A. A few weeks later, I joined the Maine language-arts consultant Nancy Andrews and several other colleagues to begin selecting "anchor papers" for the essay-reading sessions of the test-scoring scheduled in April. These anchor papers represent the designated categories on the scoring guide, ranging from 1 (weak) to 6 (stellar). Our task was to identify approximately a dozen representative essays in each category in order to provide a range of samples for those who would do the actual reading and scoring.
During a mid-morning break, one of the other readers commented that his school's M.E.A. writing scores had gone down last year and yet he felt confident that the school's writing program was as strong as ever. He was puzzled. Nancy Andrews said that this decline was to be expected. Schools with low scores were improving and students throughout the state were writing better. She repeated several times that she'd had a difficult time finding "low" essays to include in the day's collection of samples. Papers that now earn a 1 or 2 used to earn scores of 3 or 4; that means that a paper which once would have scored in the top half, at 4, now might earn a next-to-the-bottom score of 2. So if a school's students continue to produce work equivalent to past years', their scores will go down.
We sat in stunned silence. One teacher asked, "Have you publicized this change?"
Nancy Andrews said she has given public presentations about the overall improvement in writing by 11th graders in Maine; at one, she simply read a sampling of "1" papers produced over the years. Hearing the lowest-scoring essays move from incoherent, one- or two-sentence papers into full-page paragraphs did more, she said, than any commentary to demonstrate the improvement.
Statewide improvement in writing is indeed cause to rejoice. But most of us there had not realized that essay-scoring has been "renormed" as the quality of the work changes, and that because the criteria for scoring have shifted to reflect improved writing, scores at successful schools may actually decline, even though the writing stays as good as ever.
The "truth" of what a numbered score once meant shifted under the weight of words. In learning to understand renorming, I had to let go of my childhood sense of numbers as unchanging absolutes.
Within the world of American educational test-makers, the only "unchanging absolute" has been the S.A.T. Other tests, I learned, are regularly renormed as the pool of test-takers shifts, such as the M.E.A. writing scores changing to reflect increased competence. But the S.A.T. has not been renormed in over 50 years. In other words, a score of 500 on the verbal or math section of the S.A.T. would mean the same thing whether it were earned in 1944 or 1974 or 1994. This consistency is part of what has given the S.A.T. its stability and credibility over the years. Colleges know what the scores mean. Even parents who look at their old scores and then at their children's scores know they are comparing the same thing.
Recently, however, I learned along with other educators that the S.A.T. has succumbed to the pressures of psychometrists, those who study the science of measurement, and that on April 1, 1995, the S.A.T. is going to be readjusted to make 500 the mean for both math and verbal sections, just as it was back in 1941; this readjustment will cause an upward shift in scores for the average test-taker of from 70 to 80 points in verbal and from 20 to 30 points in math, an additional 100 points overall. The only scores not affected by this shift will be top scores; they will simply stay high. Like every other test, the S.A.T. is being renormed.
What makes all this hard for the average person to grasp is that in the process of renorming, better writing in Maine lowers the scores of the average Maine Educational Assessment test-taker; yet decades of declining scores nationwide will now raise (by 100 points) the scores of the average S.A.T. test-taker.
Shaking my head in grim amusement, I called the president emeritus of the College Board, George Hanford. I prefaced my questioning by remarking that the "recentering" (as it is called) of the S.A.T. strikes me as politically correct and absolutely wrong. It smacks of inflated "self-esteem": "Oh, are scores low? Do people feel bad? Let's just raise them a hundred points and make everyone feel better about American education."
President Hanford suggested that I might be overreacting, but he agreed that this recentering of the S.A.T. is an irreparable mistake. "My reason," Mr. Hanford said, "is that one of the few things that has remained immutable as an educational benchmark is the S.A.T." He directed me to his book, Life with the S.A.T., in which he had written, "Although most tests are regularly renormed and thus have averages that change over time, the S.A.T. isn't and doesn't." He went on to say that he understood the psychometric reasoning behind the current changes, that from a purely technical point of view, the renorming can be justified. "But," Mr. Hanford reiterated, "once this happens there will be nothing that will be an anchor to windward or a permanent benchmark in American public education." The S.A.T. will become like any other test, which serves the marketing purposes of the test-maker, and renorming this time opens the door to continual renorming. Mr. Hanford, who happens to be my father, concluded by saying, "I think it's a cop-out."
After hanging up the phone, I thought about this new word I'd learned relevant to changes in scoring of the Maine assessment and now the S.A.T.: "renorming." Numbers are not absolute. What renorming does is measure today against itself. What students will get in both cases is not measurement against some absolute standard, but measurement against other students taking the test--whichever test--today. This, I had learned, is the clear intent of the Maine Educational Assessment, this measurement of each 11th-grade class against other members of that class. But the S.A.T.?
As I thought of my uncle's love of math's "one right answer," I shuddered. Numbers, I realized, can be as slippery as words. Perhaps more so. I wished my uncle were still alive so I could tell him to give up numbers for words, to go read Moby Dick instead of Euclid. At least, in Moby Dick, the whale wins every time.
Vol. 14, Issue 06, Page 32Published in Print: October 12, 1994, as The Alphabet Soup of Scores