Enemy of Innovation
Our obsession with standardized testing is impeding reform
In 1987, a team of teachers, administrators, university faculty members, and technology gurus, bound by no tradition, gathered in St. Paul, Minn., to design a public school from scratch. Gone would be classrooms with forward-facing desks, 50-minute class periods, report cards, required textbooks, grade levels, and lengthy summer vacations.
Instead, this new school would be a place where students, working with teachers and parents, would identify their strengths, needs, and goals and create their own learning plan. Through projects that could reach into the community, students would find, organize, and make sense of information instead of just passively absorbing what the teacher and textbook presented to them. Home base for these real-world students would be nothing like the traditional school building: Instead of classrooms, students would work in labs, wired with video and computer networks, and enormous cooperative learning spaces. Students would regularly demonstrate their progress in areas such as reading, writing, and problem solving through what they produce—written work, video presentations, speeches, and computer programs.
Dubbed the Saturn School, after General Motors' break-the-mold approach to making cars, the 4th-8th grade school opened its doors in 1989 exactly as the design team conceived it—except for one vestige of the factory-model school that it could not throw out: norm-referenced, standardized testing. As a result of that one holdover, the true Saturn vision may not survive.
Immersed in the exciting business of learning to use their minds, Saturn students have not fared well in the trivial-pursuit world of standardized testing. Their scores in spelling and math computation declined significantly in the first two years. So, despite a shower of accolades from students, parents, and the steady stream of visitors, including President Bush, the school has been given the educational equivalent of the ultimatum, shape up or ship out: Get test scores up, teachers have been told, or the Saturn project will be terminated.
Standardized testing and the traditional factory-model school were made for each other. In the latter half of the 19th century, the primary purpose of schooling evolved from producing an educated elite to training for industrial America the masses of immigrants and rural poor flocking to the cities. To fulfill that mission, schools were organized like assembly lines. Students would pass through grades, acquiring the nuts and bolts of knowledge as they progressed. The most basic skills would be taught first through drill and practice; material of increasing complexity would be added as students moved through school. Relying heavily on textbooks and locked into carefully sequenced curricula, teachers would efficiently transmit pre-packaged information to docile students.
The public schools, Harvard University President Charles Elliot declared in 1908, should sort children according to their “evident and probable destinies.” The standardized test was the scientific and effective tool for accomplishing that.
Both the factory-model school and the norm-referenced standardized test have proven remarkably durable, despite cognitive research that strongly suggests traditional schools don't teach the way children learn and the tests don't effectively measure real learning.
After years of study, researchers and psycholinguists have concluded that children constantly engage in a search for meaning, structure, and order and that schools should support their natural inclination to develop and test hypotheses about the world around them. The development of thinking skills does not have to wait until students have mastered the basics; in fact, higher-order thinking and the mastery of knowledge are inextricably linked and mutually supportive.
The new insights fostered by research in learning have nourished such grassroots initiatives as the whole-language movement and cooperative learning and have begun, especially during the past decade, to transform classrooms in hundreds of schools across the nation.
But, these new and more sensible approaches to teaching and learning are not likely to spread rapidly or endure for long if they are evaluated on the basis of student performance on the norm-referenced standardized tests that are dominant in American schools. Lauren Resnick, a cognitive psychologist and director of the Learning Research and Development Center at the University of Pittsburgh, says the best way to ensure the success of reform “is to attack directly what is one of the most powerful dampers to the kind of change we need: the current testing system.”
“Talk to teachers who have caught on to the idea that the kind of teaching required in a `thinking curriculum' is possible,” Resnick says, “and then ask them what is the biggest barrier to it. Their answer every time is, `Those standardized tests are coming, and I'm afraid my kids won't pass them.' “
Teachers are in a quandary: They are urged to take risks and be innovative, but they know that their students will be judged on how well they score on tests that do not measure innovative teaching and learning. Monty Neill and Noe Medina of FairTest, a watchdog organization, make this point in an article in Phi Delta Kappan. Research shows, they argue, that “teaching behaviors that are effective in raising scores on tests of lower-level cognitive skills are nearly the opposite of those behaviors that are effective in developing complex cognitive learning, problem-solving ability, and creativity.”
Not surprisingly, America's near obsession with standardized testing has had a chilling effect on education reform. In Testing in American Schools, a comprehensive 1992 report on the subject mandated by Congress, the U.S. Office of Technology warns that standardized testing is an enemy of innovation and that it threatens to undermine many promising classroom reform efforts. “Many teachers, administrators, and others attempting to redesign curricula, reform instruction, and improve learning feel stymied by tests that do not accurately reflect new education goals,” the study states.
Principal Pamela Clark is one such administrator. Over the past six years, Clark transformed the program at Sunnyslope Elementary School in Phoenix. Her campaign to educate the whole child has included making sure each youngster has enough to eat, adequate medical care, and appropriate social services. She has linked the school with community service agencies, recruited a social worker for her students, and drummed up parent support in a highly transient, poor white neighborhood. As a result of the efforts, students are spending more time reading, writing, and learning from each other.
Parents are almost unanimous in their support for the school. Visitors tell Clark that Sunnyslope is on the cutting edge of school reform. Still, the principal is on the hot seat: Her students are not showing significant improvement on standardized tests. “We've taken the heat,” Clark says. “People visit and say what we are doing is wonderful and developmentally appropriate, but we're sitting here with blistering fannies.”
Pressure to raise test scores in groundbreaking schools comes in nerve-racking waves, observes Carole Edelsky, professor of curriculum and instruction at Arizona State University. Every once in a while, she says, there is a “whole flurry of activity” during which teachers and principals have to defend themselves against accusations that they aren't really teaching anything because the test scores aren't going up. Things quiet down for a while, and then there is more turmoil, she says. “Then it's OK again.”
This grip that standardized testing has on American education is bad enough, but what makes it even worse, many educators argue, is that the tests themselves are seriously flawed. Neill and Medina write: “The use of standardized test scores as the primary criteria for making decisions of any kind is reckless, given the erroneous assumptions that undergird standardized tests, the limited range of skills and knowledge that they measure, their limited reliability, their lack of validity, and the impact that race, ethnicity, family income, and gender exert on test results. Yet just such reckless decisions seriously damage student achievement, the curriculum, and education reform in many schools and districts.”
The nation's roughly 44 million students take a total of 127 million tests a year, for an average of three standardized tests a year per student, according to a report by the National Commission on Testing and Public Policy, titled From Gatekeeper to Gateway: Transforming Testing in America. A student sitting down with a No. 2 pencil in hand is most likely to encounter one of the “big four”: the California Achievement Test, the Iowa Test of Basic Skills, the Metropolitan Achievement Test, or the Stanford Achievement Test.
These four, and many other smaller tests, share some common characteristics. They are standardized; that is, they ask the same questions across different populations to permit comparisons. They are norm-referenced, which means the items are chosen not to establish how much students know of what they ought to know, but rather to highlight differences in students so they can be ranked against others in their age group. And the tests are made up primarily of multiple-choice items.
Standardized tests are marketed as scientifically developed instruments that objectively, inexpensively, and reliably measure students' skills. States and school districts are buying the pitch—and the tests. The national commission on testing estimates that test preparation and administration consumes some $100 million of tax money each year and that the nation's students spend a total of 20 million school days a year taking tests.
Although norm-referenced standardized tests came into use just after the turn of the century, they were not employed by a majority of the schools until the 1930s. And it was not until the 1960s and '70s that standardized tests began to be used widely. A position paper on the subject by the Association for Childhood Education International points out that few students who graduated before 1950 took more than three standardized tests in their entire school careers. But today's graduates will have taken up to 36 standardized tests during their 12 years of schooling.
The recent explosion in standardized testing was triggered during the 1980s by the reform movement's demand for greater accountability. To garner support for sweeping education initiatives and the bigger budgets needed to pay for them, lawmakers had to promise constituents that reforms would pay concrete dividends. Test scores, they said, would provide the proof. By 1985, two years after the publication of A Nation at Risk, new testing laws had been passed in 30 states. By the 1989-90 school year, 47 states had mandated standardized testing. And even in the three states that had not, many districts required standardized tests, according to the OTA report. In Pennsylvania, for example, 91 percent of districts used standardized tests even though there was no state testing requirement.
“There has been a dramatic increase in the use of students' scores to hold school systems, administrators, and teachers accountable,” the national commission on testing's report states. “Thus, not only has the volume of testing increased, but testing now looms more ominously in the lives of many educators and children, influencing what they teach and how, and what they learn and how.”
Today, test scores are treated as if they were magic numbers. Newspapers rank schools and districts by their scores. Real estate agents pitch test scores to sell houses. Some districts have even fired school administrators because of test results. The principal of the Dool School in Calexico, Calif., for instance, was fired when test scores fell the year after he implemented a whole-language program.
“When you take a simple little number and elevate it to the status that it is elevated to in this particular culture, it is very destructive,” says Peter Johnston, associate professor of education at State University of New York at Albany. “While people may say it's only one of a number of indicators, it happens to have a very privileged status.”
One reason the public hold test results in such high esteem is that the government and education researchers routinely use them to evaluate the worth of schools and programs. Says Edelsky: “The prevailing wisdom—you have to search so hard to find someone who doesn't believe this—is that the way to evaluate the success of anything is via tests.”
Federal funding—including funding for Chapter 1—is often contingent on schools meeting and maintaining specified achievement levels. Eva Baker, codirector of the National Center on Research on Evaluation, Standards, and Student Testing, says this may be the main reason so many states require testing. But even this use of tests can wreak havoc on reform. One of Arizona's top 10 schools, Granado Primary School, which has been recognized as a “Lead School” by the National Council of Teachers of English, was forced to re-evaluate its program or lose its Chapter 1 funding, based on the result of standardized test scores.
The research community also puts a high value on standardized tests; in fact, the bulk of education research is based on test score data. When a researcher wants to know if a particular teaching method is effective, he or she usually compares test scores of students taught with the new method with those of a control group. If the students' test scores are higher, they feel comfortable saying, unequivocally, that the approach is more effective. “It's such a tradition in educational research and in education,” Edelsky says. “It fits so well a cultural search for and acceptance of quick answers.”
Even some key members of the testing community believe that standardized tests have been accorded too much power. Gregory Anrig, president of the Educational Testing Service, for example, has decried the overuse and misuse of standardized testing. “When I was in the Army,” he says, “the order was: `If it moves, salute it. If it stands still, paint it.' Now if it stands still we say, `Test it.' “
He and others complain that tests are being used to make decisions they were never intended to make, determining the fates of students, teachers, principals, programs, and whole schools. Parents, researchers, policymakers, and the public have become almost totally reliant on test scores as a measure of achievement. Whether kids actually learn is less important than how well they do on tests.
“Do schools and policymakers ask too much of these tests?” asks H.D. Hoover, an author of the Iowa Test of Basic Skills for 25 years. “God, yes. I'm tired of seeing the tests I've worked on used for things they were never intended to do. They are using them to make policy decisions that the tests are not good at making.”
The primary purpose of these tests, Hoover says, is to give parents and teachers an external view of a child's performance. Having been educated in a one room schoolhouse in the Ozarks, Hoover knows how isolated schools can be. “Kids may be knocking the socks off the local district,” Hoover explains, “but compared with other kids in the rest of the United State, how are they doing?”
But many teachers argue that standardized tests cannot provide reliable comparative data; test scores, they say, do not always give an accurate picture of students' accomplishments. “Tests measure what they were designed to measure: what goes on in a school that delivers a traditional textbook curriculum, with kids in packages of 30,” says Saturn School project director Tom King. “They have limited usefulness in a school where kids are involved in activity-oriented, cooperative learning, out in the community, doing things with their hands and minds.”
Saturn evaluator Hallie Preskill, a professor at nearby St. Thomas University, elaborates: “You can't test how students solve a problem with other people or by themselves, how they access resources, how they develop ideas, and so on.”
Two students in Mark French's math class at Saturn illustrate the point King and Preskill are making. One, a learning-disabled student, didn't score well on standardized tests when she started with French two years ago and still doesn't. “But now,” he says, “this person thinks for herself. She works in groups, she takes initiative, she is motivated, she is prepared. She is not shy and meek and afraid anymore.” What's more, she can demonstrate academic achievement. “She can stand up in front of the class and give a speech,” he says. “She can explain and demonstrate a computer project on geography.”
The other student tests poorly but is an incredibly bright, meticulous worker. He doesn't get very far on the tests, the teacher says, because he is so careful and has difficulty with fine motor skills; he always has to go back and clean up his answer sheet. “But,” French says, “he constantly challenges me as a teacher by what he can do in class, the questions he asks, and his thought processes.”
Although Hoover acknowledges that tests can't reflect everything that goes on in a school, he insists that tests like ITBS are a valid measure of a child's achievement. “People who say that you can only measure facts and low-level thinking on tests like these are just plain wrong,” he insists. To bolster his argument, Hoover notes that the ITBS reflects the National Council of Teachers of Mathematics' new standards, which encourage the use of calculators, computers, and other tools to help illuminate the intricacies of mathematics rather than simply focusing on the mechanics of computation.
But when George Madaus, director of the Center for the Study of Testing, Evaluation, and Educational Policy at Boston College, looked at the leading norm-referenced tests—including the ITBS—in light of the national council's new math standards, he found that a vast majority of the test items tap lower-level knowledge. “The leading tests are peas in a pod when it comes to the standards,” he says. “They don't reflect them.”
And a survey of 1,000 math teachers conducted by NCTM shows that teachers sense the dichotomy between the new math standards and the tests. Roughly half of the teachers said they emphasize rote drill and practice over problem solving and reasoning because the testing program in their state or district “dictates what they teach.”
Teachers in the Westwood School in Dalton, Ga., say that trying to innovate within the test-driven system has worn them out. Five years ago, while restructuring the school's curriculum, the teachers discovered “Mathematics Their Way,” a program that teaches math concepts through the use of manipulatives.
At the time, Georgia had the most intensive standardized testing program in the country—starting in kindergarten. “The whole system of testing was mind-boggling,” says 1st grade teacher Jimmy Nations, in a sweet southern accent that doesn't mask his anger. “The pressure on people was enormous.”
Nations says that he and his colleagues were operating in a schizophrenic world. “We were trying to do things that we believed as professionals were appropriate for our students,” he says. “At the same time, we were being held to the very rigid state test, knowing our school's scores were going to be published and compared with other schools' scores.”
Math Their Way teaches students math symbols only after they understand the concepts. Teachers at Westwood knew this approach made sense, but they didn't always use it because of the testing requirement. Nations vividly remembers one class where he drilled students on place value before they understood the concept because he knew they would need the lesson to answer questions on the test. “The little kids sat there with their eyes absolutely glazed over,” he recounts. “I felt like a puppet.”
The pressures Nations describes are common among teachers. A study cited in the OTA report sought to describe the effects of high-stakes testing on teaching and learning across the country. Seventy-nine percent of teachers surveyed said they felt “great” or “substantial” pressure by district administration and the media to improve test scores. Half of these teachers reported spending four or more weeks each year giving students worksheets and practice exercises to prepare them for tests.
There is some evidence that tests most strongly influence the academic program in urban and predominantly minority districts. Johnston, who has been studying assessment—including standardized testing—for a number of years, has noticed that testing is most benign in the suburbs, where, by and large, the students do pretty well. “Tests in this case keep the public somewhat off teachers' backs,” he says. But in urban districts, he says, the tests continually point out how things aren't going very well, so teachers feel more pressure to “teach the basics.”
The same sort of thing happens within schools. Students placed in the lowest tracks are most apt to experience instruction geared only to multiple-choice tests, according to Linda Darling-Hammond, a professor of education at Teachers College at Columbia University. These students are rarely given the chance to talk about what they know, to read real books, to write, and to construct and solve problems in math. “In short,” Darling-Hammond writes in an article in The Chronicle of Higher Education, “they are denied the opportunity to develop thinking skills that most reformers claim they will need for jobs of the future, in large part because our tests are so firmly pointed at education goals of the past.”
The limitations of what standardized tests can measure in math have their parallels in language arts. For example, the advanced spelling skills of students in a whole-language program in Washingtonville (N.Y.) Central School District didn't show up at all in their Stanford Achievement Test scores. Last year, students in six whole-language classrooms and six traditional classrooms there scored below the 50th percentile. Yet when the children's spelling was assessed from their writing samples, three-quarters of the whole-language students, but only half of the students in the traditional classes, spelled well. Many more whole-language students than traditional students tried to spell words above their grade level, and they were more successful than the others when they did.
Tests can miss the mark in measuring reading ability, as well. Vivian Wallace, a teacher at Central Park East in New York City, was involved in a study with researchers at the Educational Testing Service. She says they wanted to see if students' scores on the state mandated test, “Degrees of Reading,” correlated with teachers' assessments of the children's reading ability. Students who read very well do very well on the test, they found. But if a student does not do extraordinarily well on the test, there is almost no correlation between the student's score and his or her actual reading ability.
Ruth Mitchell, associate director of the Council for Basic Education, takes multiple-choice tests head on in her recent book, Testing For Learning. The only place multiple choice is found in the real world, she writes, is at the race track and on the driver's license test. The format promotes passivity, she argues; it asks students to recognize, not construct the correct answer. And the tests tend to measure what is easy to test rather than what is important for students to learn.
Not all educators agree with her analysis. Education professor Bob Linn of the University of Colorado has been doing research on assessment issues for more than a decade. Although he acknowledges that tests have been misused, he insists that multiple-choice questions can assess more than basic skills.
To a large extent, he says, the multiple-choice questions are fair and revealing. “If you look at the kinds of paragraphs kids are asked to read on the tests or glance through the math problems,” says Linn, “you'll see that most questions are things that parents think their kids should be able to answer.”
But many of the teachers who administer the tests year after year have become outraged with some of the items. Too many questions, they say, are divorced from meaning and context, unnecessarily tricky, and targeted toward the white, middle-class experience. They insist that the test items are not good examples of what a literate person can do.
Nations of Dalton, Ga., complains that the tests aren't in sync with the culture of his children and offers an example. A passage on the state-mandated test is about baking muffins—but his students don't even know what they are. So, every year he teaches an impromptu lesson on muffins. “Somewhere along the line, when I'm talking about cookies, I throw out the word `muffin,' “ he says. “When kids ask what that means, I act surprised and bring a muffin pan to school and bake muffins so that they know what muffins are when they see it on the test.”
Cultural bias aside, one of the things that galls teachers most about standardized tests is the fundamental structure of norm-referencing. When creating a test, test companies choose items that will spread students out the most because that will enable them to assign percentile ranks most reliably. In effect, says Pittsburgh's Resnick, the most interesting items, the ones that everybody can do and ones almost no one can do, are thrown out.
“The worst you can imagine is throwing out those parts of the test that show kids and teachers that they can succeed,” says Resnick. “You also don't want to throw out the ones that are making so much trouble because in a way those are setting the stars to reach for.”
The end product of this sifting of multiple-choice items is a norm-referenced test on which half the students score above the norm and half perform below.
Many educators say that norm-referencing is blatantly incompatible with the belief that all students can learn. Even if all students learned everything we wanted them to, they point out, the tests assure that half the students will score below the mean. “That's an assumption that I could never accept as a teacher,” says King of the Saturn School. “Why in the world would I want to devise a test where half my students were below average, condemned to failure status? Especially when the point is to teach all students; you want everyone to learn everything.”
As part of an ongoing reform project, teachers in the Hilton (N.Y.) School District defined what it means to be a good reader and writer. Good readers, they said, are people who enjoy reading, know what strategies to employ if they aren't successful, understand that the purpose of reading is for meaning, can relate things they read in different areas, know how to respond to what they read, communicate what they read, and share their perspective on what they read.
When they compared their list to the state-mandated test, they discovered that the test measures comprehension of texts, but none of the other things that they believe characterize a good reader.
They went through a similar process for writing, and only three out of their nine attributes for a good writer were even minimally addressed by the test.
When a school community decides what it values and the test doesn't look at any of these things, it is an accident waiting to happen, says Artis Tucker, language coordinator for the district.
The gap is likely to be most pronounced in schools that are heading toward reform. “Where change is being implemented,” Tucker says, “often one set of values operates for instruction and another set of values operates for evaluation.”
For many teachers struggling to improve their schools, this is the crux of the matter: Schools and society should decide what they value, find a way to truly assess students' progress in those areas, and then make that the criteria for whether a program is allowed to live or die.
At Saturn, these issues have not been resolved, and standardized tests are holding the program hostage. The staff has backed off from its bold vision and has devised a plan to raise test scores.
As King explains: “It's important that the school survive and become a model for change. To do that, it has to have a political base of acceptance. The community believes that standardized testing is critical, so it would be foolish not to make sure your kids do well—or the program disappears.”
The Saturn staff has begun a program to familiarize students with test taking—a practice so widespread in this country that it has been given a name, “testwiseness.” Students are told the test is important, given practice items, and instructed in the art of filling out bubble sheets.
The faculty is also reshaping the curriculum to match the standardized tests. For example, the school has abandoned the practice of teaching math through projects in other disciplines; now students are required to take traditional math classes that include drill and practice.
And an innovative plan to create mentorships and internships that link students with members of the community has been pushed to the back burner. What's more, teachers say that the pressure to raise test scores has discouraged them from taking their students out of the school building; they had hoped to structure their courses around the cultural, political, and business resources of St. Paul and Minneapolis.
“We've had to stifle our creativity,” says one teacher, who asked not to be identified. “We haven't been able to focus on creating new and exciting learning opportunities for our students. What has taken precedence are classes that address facts and content that are going to be tested.”
Saturn teachers are confident that the accommodations made to standardized testing will result in higher student scores. But some think the cost will be too high. The pity, says the teacher who asked for anonymity, is that “we are starting to look not so much like the school of the future; we are starting to look like a traditional school.”
Vol. 04, Issue 01, Pages 28-31