It's high time we carefully and objectively assess all the important consequences of our new assessment practices.
In March, the Public Agenda research organization published its fifth annual Reality Check survey, which asked middle and high school students about their experiences with achievement testing. (“Public Agenda: Reality Check 2002,” March 6, 2002.) Public Agenda described the survey’s results with a headline proclaiming that " ... few students are unsettled by testing " and a conclusion saying that " ... public school students nationwide appear to be adjusting comfortably to the new status quo.” IBM chief executive officer Louis V. Gerstner followed immediately with a pro-testing editorial in The New York Times, in which he cited Public Agenda’s survey as proving that “the great majority of middle and high school students are comfortable with the increased testing in public schools.”
A closer look at the data on which these conclusions are based reveals that they are unjustified. Probing a bit on Public Agenda’s Web site, we find that the actual wording of the key question was this:
Which best describes how nervous you get when you take standardized tests?
(a) I don’t get nervous at all.
(b) I get nervous but I can handle it.
(c) I get so nervous that I can’t take the test.
Of the national sample of 600 students, 23 percent chose (a), 73 percent selected (b), and 5 percent checked (c).
Nearly three of every four students responded that they do get nervous at testing time but “handle it.” What does this tell us? Very little, given the wording of the question, because we have no way of knowing how students were distributed across the spectrum of stress levels. Students who are terrified but somehow manage to take their test would pick this response alternative, as would students who experience only a twinge of anxiety. But for Public Agenda, the popularity of this response proves only that “few students are unsettled.” For Mr. Gerstner, it translates as “a great majority” being “comfortable.”
Public Agenda describes the response rate for (c) thusly, “Only a handful of students (5 percent) say they ‘get so nervous’ that they can’t take the tests.” This “handful” would amount to 135,000 students in California alone. Public Agenda may regard this problem as a small price to pay in pursuit of accountability, and indeed, it offers no expression of concern, no comment of any kind. Many of the rest of us, however, will be less sanguine that one or sometimes two students in a typical classroom are so overwhelmed by anxiety that they can’t function at all at testing time.
Public Agenda phrased some questions in an impartial way. For example, it asked, “Do you think you have to take too many of these tests, too few, or are things about right?” Here, 25 percent of respondents answered “too many,” 4 percent answered “too few,” and 71 percent answered “about right.”
But more often Public Agenda’s phrasing of questions, and its description of results, reflect a pro-testing bias. For example, it asked:
“Do your teachers focus so much on preparing for these standardized tests that they neglect other important topics, or does this usually not happen?”
Public Agenda says, Only a handful of students say they 'get so nervous' that they can't take the tests. This 'handful' would amount to 135,000 students in California alone.
Here, bad-news responses are effectively minimized by the use of “usually,” by the either/or syntax, and by the fact that students are unlikely to know what additional topics would have been covered in the absence of test-preparation activities. Unsurprisingly, 78 percent of students answered, “This usually does not happen.” Again, this tells us little.
More revealing is that on two other questions, 80 percent of students answered that " ... [my] teachers usually spend class time helping students prepare for these tests,” and 45 percent agreed or strongly agreed with the statement, “My school places far too much emphasis on standardized test scores.” These response rates ought to raise concerns, especially given the insertion of “usually” and “far,” which work to reduce the frequency of agreement.
Despite this mixed and perhaps contradictory set of results, Public Agenda paints a rosy picture. “Reality Check picks up very little evidence of strain,” it concludes.
Perhaps what is most important to recognize is that Public Agenda did not survey the students most vulnerable to testing stress—those in elementary schools. Younger students are more likely to lack the cognitive and emotional skills for dealing with high-pressure situations. Under the recently enacted reauthorization of the federal Elementary and Secondary Education Act, we are about to begin testing nationally from 8th down through 3rd grade. Some states, including California, are already testing 2nd graders.
We face a much larger problem than hidden bias in a single survey or even a series of surveys. We suffer from a dearth of solid information about the broader effects of high-stakes testing.
Why isn’t Public Agenda interested in the testing experiences of elementary students? If elementary students are considered old enough to take the tests, they ought to be old enough to be surveyed, especially by telephone, which was the method used here.
Whatever Public Agenda’s failings, we as a nation face a much larger problem than hidden bias in a single survey or even a series of surveys from such a high-profile source. We suffer from a dearth of solid information about the broader effects of high-stakes testing. While we have a wide array of data to assess the intended effects of testing—reducing the “achievement gap” or ensuring that high school graduates possess specified skills and knowledge, for example—we have very little to help us assess the unintended effects. We can, for instance, readily determine whether reported achievement gains in Texas are real by comparing Texas Assessment of Academic Skills scores with the state’s National Assessment of Educational Progress and SAT scores, and by also using trend data on student demographics, retention-in-grade rates, special education placements, and dropout rates. (Many of the reported TAAS gains do not hold up under scrutiny.) But we have almost no systematic data, in Texas or anywhere else, to help us assess testing’s effects on teacher morale and retention rates, school climate, time spent teaching different subjects, quality and depth of instruction, and many other pivotal dimensions of schooling.
From my own vantage point, working closely over time with a small number of school districts scattered across the country, the negative effects of testing are serious and are growing steadily. More and more class time consumed by test-prep activities. More instruction of the drill-and-kill variety. The short-shrifting of untested subjects such as science and history, not to mention art and music. More teacher frustration and resentment, and early retirement by some of our best veteran teachers. A shortage of highly qualified candidates for principalships. Gerrymandering of special education placements, interschool transfers, and even, in one instance, school attendance boundaries.
We would be arrogant or naive to think that we can know in advance all the consequences of such a swift and drastic policy shift as high-stakes testing has been.
But, of course, these are only my personal observations and should not be taken any more seriously than the observations of someone who has witnessed a range of positive effects in other settings. Individually, we each can see only a small part of the elephant, and inescapably, we do so through our own subjective lens.
High-stakes testing is a radical change. It is an unprecedented centralization of power at the state and federal levels. It is an unprecedented form of pressure on districts, schools, teachers, and students. We would be arrogant or naive to think that we can know in advance all the consequences of such a swift and drastic policy shift. Many dollars are going to study the intended effects of this change; virtually none are going to track its broader effects, positive or negative. It’s high time we carefully and objectively assess all the important consequences of our new assessment practices.
Eric Schaps is the founder and president of the Developmental Studies Center in Oakland, Calif.
A version of this article appeared in the June 05, 2002 edition of Education Week as High-Stakes Surveys