Unmasking the Low Standards Of High-Stakes Tests
|Must raising standards mean standardization?|
In an apparent attempt to reclaim the offensive in the
standards-and-testing debate, Robert Schwartz and Matthew Gandal,
writing in these pages, have seriously misstated the position of those
raising the alarm about the consequences of high-stakes testing now
under way in several states. ("Higher Standards, Stronger
Tests: Don't Shoot the Messenger," Jan. 19, 2000.) They contend
that the critics "come out of the woodwork when passing rates on tests
are low" and "blame the messenger, rather than take on the shortcomings
that standards and assessment expose." The charge is not only untrue,
but seriously distorts the deep and troubling concerns expressed by
large numbers of parents, teachers, researchers, and legal experts
about the limitations of high-stakes-testing policies. We believe
passionately in raising student achievement, but we disagree that
raising standards requires standardization.
To set the record straight: We critics are not, as Messrs. Schwartz and Gandal suggest, Johnny-come-latelies to the controversy over standards and high-stakes testing. Nor are we back-benchers. Many of us are front-line, in-the-trenches practitioners who work in schools and whose track record with real students-not statistical proxies-attests to our long-term commitment to high standards. The debate has been one-sided. We don't command the same access to the media as do the writers, politicians, or state education commissioners, but we have consistently and urgently expressed alarm regarding the consequences of linking single-event tests-whatever their nature-to critical decisions such as graduation and diploma granting.
We have argued to whomever will listen that one size fits few: Common sense tells us that there are, as Howard Gardner puts it, multiple forms of excellence and multiple ways of finding out what kids know and can do.
We have argued that since learners are complex, assessments should be too, that no single instrument or set of instruments should be used to make life—determining decisions. While proponents of high-stakes tests admit that their instruments are imperfect, they seem to be willing to live with rates of failure—"casualties," as one state commissioner of education put it-that are alarming to us. There are some predictions of dropout rates as high as 50 percent!
We have argued that high-stakes tests devalue high-quality education. Just listen to teachers across the country detail how these tests have trivialized curriculum, or pay a visit to schools throughout New York state and quickly see how the tests have taken control of the school day. Already, music and art classes have been pushed to the side to make way for more test-oriented English classes.
We have argued that high-stakes tests devalue high-quality education.
Maybe this is why none of the high-performing independent schools in New York state gave the state regents' tests in the past or plan to use them now. Maybe this explains why the high-performing Westchester County districts have been seeking a variance from the tests. Perhaps this explains why, at a recent legislative hearing, the spokesman for New York's archdiocesan schools reluctantly admitted that Roman Catholic schools were giving the tests because they had been "coerced."
And we have done more than argue. We have developed schools that require students to demonstrate a high level of competence in ways that meet and exceed state standards, and we have done it in ways that engage young people far more effectively than high-stakes tests. We have shown that, unlike high-stakes testing, which statistically favors the more privileged, performance assessment results in genuine equity by leveling the playing field even with youngsters for whom standardized testing is often an obstacle.
So, enough of this talk about "shooting the messenger." Let's open the envelope and examine what's inside. Let's see what these "tougher standards" look like, as revealed on the English-language-arts regents' exam required in New York for high school graduation and see whether these high-stakes tests adequately or fairly represent high standards:
Question No. 1 on the English-language-arts regents' exam (June 1999), which apparently is intended to test listening skills, asks students to respond to a speech on the Suzuki method of teaching the violin. Following the speech, which is read aloud by test proctors, students are required to answer six multiple-choice questions like this one: According to the Suzuki method, which step comes first? (1) playing by ear, (2) reading written notes, (3) listening to music, (4) writing original tunes. Then they are told to write a letter to their "school board recommending whether or not the Suzuki violin method should be taught in your district."
The content focus of this item (that is, the Suzuki violin method) raises serious issues with respect to relevance. Some experts argue that using such remote and artificial topics undermines students' capacity to demonstrate the higher-order-thinking skills the test is designed to measure. Students are expected to listen to the proctor read a speech on a subject quite remote to most students, then write a letter arguing a position about which they have only the most superficial knowledge. This contrasts sharply with how we actually want students to behave.
In our classes, students are taught to form opinions by conducting research on multiple perspectives, asking questions, and analyzing the findings. This test item totally obviates such rigor. In fact, it raises serious questions of validity: What exactly is it that this test item is supposed to measure? Is it content or test-taking expertise? And how do test proponents defend the reliance on individual proctors to read aloud the selection? It may be a small thing, but, as we have observed in this election year, delivery counts a lot.
Question No. 2 is apparently designed to test for "information and understanding." Using information on the history of child labor, students are instructed to "write a report summarizing some provisions of current New York state law regarding the employment of children and discussing the conditions that may have led to those provisions."
Test-takers are provided with a chart and a 3½-page reading on the topic and are required to answer 16 multiple-choice questions on the reading material. Of these 16 questions, only one requires students to apply inference skills, another is a "main idea" question, and the remaining 14 involve recall of specific detail from the reading. Here's a typical question: As a result of the 1938 Wages and Hours Act, children are not allowed to (1) earn minimum wage, (2) work after school, (3) hold dangerous jobs, (4) pay income taxes.
While we obviously want students to be able to locate and retrieve information from a reading selection, this activity could never qualify as the primary focus of an intellectually rigorous curriculum. The research papers we assign our students require, in addition to in-depth understanding of factual information, an emphasis on how to apply such information for a given purpose, how to assess its value, and how to weigh it against additional evidence, not merely to restate it.
Is this what we mean by higher standards? In our own schools, students are taught to read and analyze materials reflecting multiple perspectives, and they are expected to demonstrate the ability to develop a logical argument based on those perspectives. Our schools focus on teaching students ways to assess diverse points of view, to act as historians, to identify reliable sources of information, to debate ideas, and to demonstrate these skills in thoughtful, rigorously argued research papers.
Consider these research papers recently completed by students in our performance-assessment schools: "Did Lincoln Free the Slaves?" "How Should Columbus Be Regarded Today?" "Why Did the United States Become Involved in Vietnam?" "Did King Make the Movement or Did the Movement Make King?" As these titles suggest, performance assessment provides a vehicle by which to achieve the high standards called for by the state board of regents in more authentic and effective ways than does the English-language-arts exam.
Question No. 3 on that test asks students to read an essay and a poem on "the influence of teachers on the lives of students." Students are required to answer 10 short-answer questions before beginning their own essays. Again, the multiple-choice questions place a far greater emphasis on the recall of specific factual information than they do on inferential or implied understandings (the more sophisticated reading skills college work requires).
Both the content, distinguished by its artificiality, and the language used in the task's instructions (write a "unified essay" with a "controlling idea") require students to suspend their normal learning behaviors, move into "test mode," and write about something quite artificial, within time constraints that undermine quality work, in a manner that is formulaic.
Finally, Question No. 4 on the test asks students to provide a "valid interpretation" of a statement proposed as a "critical lens" by comparing two works of literature the student may have read. Leaving aside the ambiguous use of such terminology as "critical lens," it is the exact quote that is even more troubling: "In literature, evil often triumphs, but never conquers."
Students in schools using performance assessment are frequently required to write literary essays in which they compare two works of literature with respect to genre, period, literary technique, or style. Yet, even our most sophisticated readers and writers might be handicapped by this "critical lens" statement. In one of our classes, for example, students' reading includes Catch-22, The Painted Bird, An American Tragedy, Madame Bovary, The Hunchback of Notre Dame, One Flew Over the Cuckoo's Nest, and "Agamemnon." Since one could effectively argue that, indeed, evil both triumphs and conquers in these literary works, our students would be penalized for not answering the question posed.
And what is the connection between questions like these and the standards themselves? Close examination suggests very little. Many critical skills listed in the New York state standards are simply ignored in the examination itself. How did the test-makers choose which standards to emphasize? No one has provided that information.
"Speaking," for example, is emphasized in all four of the state English-language standards. According to the regents who set the standards, students should "present orally" well-developed analysis of issues, ideas, and texts. Considering the skills expected of students in most college and work settings, such a standard is well-chosen. Yet, nowhere in the exam is oral presentation evaluated. Thus, schools that take the standards seriously and emphasize discussion, question-asking, oral analyses, presentation, and the development of succinct, informed, and thoughtful oral responses are undermined by the state's own assessment system.
There are other omissions. Despite the prominent place given to multicultural literature (it is listed at the top of English-language-arts Standard 2, there is no exam question requiring students to draw on such experience or knowledge. Moreover, despite Standard 2's detailed discussion of literary terminology, there are no exam questions relating to this point.
The English-language-arts exam is six hours long (making it six times longer than the sat II exams and three times longer than a typical Ph.D. defense). Given over two days, it requires students to apply a set of test skills-pacing, format, suspension of one's belief system, test-taking terminology, and an understanding of multiple-choice-question structure-that are quite unlike skills required of students in classrooms where engagement with and ownership of material, reflective behaviors, revision skills, and consideration of multiple perspectives are emphasized.
To prepare students for such tests requires teachers to exchange a rigorous curriculum (one in which students prepare analytic essays, thoughtful research papers, original science experiments, and sophisticated math applications) for repetitious drill on timed practice tests. Eventually, as such exams become embedded in the schools, students, understanding that less is required, will reject the more rigorous efforts demanded by performance assessments. High standards will be replaced by test-driven lower ones.
The results are already evident: fewer entries in writing competitions, less time for in-depth analysis during classroom discussions, disregard for subjects that won't be tested, depersonalization of teacher-student relationships, and the beginning signs of student alienation and climbing dropout figures. (Already, some schools that serve the most marginal students have reported that their registers have declined by a third.)
A high-quality education encourages students to set long-range goals, learn persistence and time management, and practice reflection and revision skills that further education and lifelong learning. High standards will not be achieved if these goals are neglected.
Ann Cook is the director of the Urban Academy Laboratory School, Cece Cunningham is the director of Middle College High School, and Phyllis Tashlik is the director of the Center for Inquiry in Teaching and Learning, all in New York City. The schools are members of the New York Performance Standards Consortium, a network of 40 New York state schools that have developed and use a system of performance assessment.