Opinion
Assessment Opinion

‘Standardized,’ You Say?

By Todd Farley — November 17, 2008 5 min read
  • Save to favorites
  • Print

I’m always amazed by the certainty with which staunch advocates of standardized testing view the results of those large-scale assessments. This past September, for example, U.S. Secretary of Education Margaret Spellings said that “according to the Nation’s Report Card, since 2000, more kids are learning reading and math.” She made the claim as if it were indisputable fact.

BRIC ARCHIVE

I, meanwhile, have spent the last 14 years scoring student responses to open-ended questions on standardized tests (including nearly yearly work on the National Assessment of Educational Progress tests called “the nation’s report card”), so I view any such results with considerably more skepticism. In fact, I’m not certain the industry that’s employed me for the last decade-and-a-half has ever produced the results for a test—as far as I can tell, they’ve only produced results.

There’s not enough column space in this newspaper to list the myriad discrepancies I’ve seen in the scoring of short-answer/essay questions on “standardized” tests, but in my opinion, test scoring is akin to a scientific experiment in which everything is a variable. Everything. In my experience, the score given to every open-ended response, and ultimately the final results given to each student, depended as much on the vagaries of the testing industry as they did on the quality of student answers.

To start, those student scores would depend on the scoring center where a test was read, whether one in Iowa populated with liberal whites, one in Arizona filled with conservative senior citizens, or one in Virginia peopled more with African-Americans and military personnel.

Those student scores also would depend on what point in a project a test was assessed (either before some rule got changed, or after), what time of day it was read (hopefully not until after the morning coffee had kicked in, but before the fog of daily boredom had crashed down), and what cubicle it was sent to (one whose trainer was more stringent in interpreting scoring rules, or one whose trainer had a more tolerant perspective).

Ultimately, those scores would depend on which temporarily employed “professional scorer” assessed each student response—whether one of those workers who actually understood the rules and doled out the points accordingly or, more likely, one of the dingbats and dilettantes I worked with over the years who pretty much had no idea what they were supposed to be doing. Seriously, who else does anyone think is doing that short-term, high-stress, low-paying job?

During my career, I did work with plenty of temporary scorers who were intelligent and accomplished people, including those working part time as they went to law school or medical school, teachers working night shifts after a day in the classroom, one guy whose debut short-story collection was already on the shelves of Barnes & Noble, and another running the 400-meter in the Atlanta Olympics. Mostly, however, I worked with people who were not particularly smart or accomplished. I worked with scorers who, for example, were too daft to recall the scoring rules—from 1994, when one friend could never remember “riding in a single file” was an acceptable bike-safety rule, to 2007, when an avuncular co-worker was forever stumped that he could credit “no hope for the future” but not “no hope for the past.”

I worked with scorers whose knowledge of the English language was so suspect I doubted their ability to comprehend any student response, let alone to recognize the subtleties of either proper English or its American vernacular (a Japanese woman not knowing that “irksome” has a negative connotation, a Middle Eastern man failing to understand what “grossed out,” “bummed out,” or “feeling it” meant). I worked with scorers who continued to score tests after completely failing to understand the training, either because of physical ailments (a guy who had 25 percent hearing loss, another with limited short-term memory), a lack of common sense (people crediting “dirt,” “mud,” or “Styrofoam” as a student’s favorite food), or perhaps the onset of senility (one grandmother giving grades of 4 to student responses that were obviously 1s, and 1s that were obviously 4s).

I worked with a scorer who told me his “real job” was as an “ultimate fighter” (those bruisers who crawl into an octagonal ring to engage in bare-knuckled brawling for the enjoyment of the American viewing public). And while he was a nice guy, his mind worked about as quickly as you’d expect from someone who’d gotten punched in the head a lot. After three weeks of scoring student responses to a state reading test, he waved me over to his computer to ask me exactly what he was being tested for. Was it psychological, he wondered? I had to explain to him—after he’d been working for 15 days and had scored thousands of student responses—that he wasn’t being tested, the students were.

I’m deciding if the kids did good?” he asked. “I’m deciding if they’re smart or not?”

“Basically, yes,” I smiled. “Yes you are.”

“Wow,” he said, shaking his head in disbelief. “Me? Wow.” Wow indeed.

I don’t want to be too smug about my own superiority or too unkind about people who have been my fellow scorers. Still, it’s important to remember, when talking about the almighty standardized test, that many of the people who will actually read and assess student responses might have ended up at a scoring center because they had trouble getting a job elsewhere. They had college degrees and time on their hands, so they found work scoring standardized tests.

And lest anyone think I’m being hyperbolic, doubting that such a motley crew of ne’er-do-wells could ever do the sort of professional work the assessment industry surely guarantees, do remember that the statistics that prove the “standardization” of the test-scoring process are numbers often controlled by temporary supervisors whose lives are made easier by producing just such results. In other words, if a supervisor and his scorers can’t go home some day until the team’s “reliability percentage” (agreement between scorers) hits 80 percent, trust me, that threshold will be reached. It’s not that hard. As a friend of mine once quipped, “I can make statistics dance.”

Dancing statistics, however, are a story for another day. Let’s conclude here simply by agreeing that the only certainty there should be regarding standardized-test scores is the certainty they’re not indisputable.

Related Tags:

A version of this article appeared in the November 19, 2008 edition of Education Week as ‘Standardized,’ You Say?

Events

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Assessment Webinar
The State of Assessment in K-12 Education
What is the impact of assessment on K-12 education? What does that mean for administrators, teachers and most importantly—students?
Content provided by Instructure
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Student Well-Being Webinar
Centering the Whole Child in School Improvement Planning and Redesign
Learn how leading with equity and empathy yield improved sense of belonging, attendance, and promotion rate to 10th grade.

Content provided by Panorama
Teaching Profession Webinar Examining the Evidence: Supports to Promote Teacher Well-Being
Rates of work dissatisfaction are on the rise among teachers. Grappling with an increased workload due to the pandemic and additional stressors have exacerbated feelings of burnout and demoralization. Given these challenges, what can the

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Assessment Whitepaper
Report: Now is the Time to Reimagine Assessments
In New Meridian’s latest whitepaper, Founder and CEO Arthur VanderVeen identifies five principles to boldly re-imagine assessment and cre...
Content provided by New Meridian
Assessment State Test Results Are In. Are They Useless?
While states, districts, and schools pore over data from spring 2021 tests, experts urge caution over how to interpret and use the results.
9 min read
FILE - In this Jan. 17, 2016 file photo, a sign is seen at the entrance to a hall for a college test preparation class in Bethesda, Md. The $380 million test coaching industry is facing competition from free or low-cost alternatives in what their founders hope will make the process of applying to college more equitable. Such innovations are also raising questions about the relevance and the fairness of relying on standardized tests in admissions process.
A sign is posted at the entrance to a hall for a test-preparation class. Assessment experts say educators should use data from spring 2021 tests with caution.
Alex Brandon/AP
Assessment Data Young Adolescents' Scores Trended to Historic Lows on National Tests. And That's Before COVID Hit
The past decade saw unprecedented declines in the National Assessment of Educational Progress's longitudinal study.
3 min read
Assessment Long a Testing Bastion, Florida Plans to End 'Outdated' Year-End Exams
Florida Gov. Ron DeSantis said the state will shift to "progress monitoring" starting in the 2022-23 school year.
5 min read
Florida Governor Ron DeSantis speaks at the opening of a monoclonal antibody site in Pembroke Pines, Fla., on Aug. 18, 2021.
Florida Gov. Ron DeSantis said he believes a new testing regimen is needed to replace the Florida Standards Assessment, which has been given since 2015.
Marta Lavandier/AP