Opinion
School & District Management Opinion

‘Seek Simplicity . . . and Distrust It’

By Lee S. Shulman — June 07, 2005 12 min read
  • Save to favorites
  • Print

Alfred North Whitehead’s dictum about the virtues and dangers of simplicity helps explain why we are confused about what kind of evidence should be used to guide education policy. We often have lots of evidence to choose from; the problem is making sense of it and drawing the right lessons. Let’s look at some examples.

Educational researchers David C. Berliner and Audrey L. Amrein, both from Arizona State University, published in 2002 a report on “The Impact of High-Stakes Tests on Student Academic Performance.” They concluded that such testing failed to have the intended positive impact on student learning and was often bad for students. The New York Times ran the story on Page 1. The Times’editorial page, as well as others nationwide, featured the research and urged caution on the implementation of high-stakes testing. The editorial expressed particular concerns over the testing included in the federal No Child Left Behind Act, through which schools can be financially penalized if test scores show that they are “in need of improvement.” It was a great story and an important one. Research evidence had thrown a serious wrench into the very heart of the Bush (and Clinton) school reform strategies. Or had it?

A week later, economists Martin Carnoy and Susanna Loeb of Stanford University reported their own findings from a similar database (since published in the journal Educational Evaluation and Policy Analysis) and, using different methods of analysis, concluded that Berliner and Amrein got it wrong: High-stakes testing actually was pretty good for kids. A month later, Margaret E. Raymond and Eric A. Hanushek of the Hoover Institution published their own analysis and concluded that high-stakes testing was actually very good for kids. They used much the same data as Berliner, Carnoy, and their respective colleagues, but analyzed it differently, aggregated the information at different levels, and drew conclusions quite different from Berliner and Amrein’s and reasonably congruent with Carnoy and Loeb’s.

Research evidence informs and enlightens decisionmaking; it does not bypass the need for interpretation and judgment.

These contradictions motivated statistician Henry Braun of the Educational Testing Service to conduct a new study in which he used four different modes of analysis to evaluate the data on the connections between statewide high-stakes testing and student achievement. He concluded that the decisions that researchers made about methods of analysis largely determined which kinds of findings they reported. Analyzed in some ways, the evidence showed positive effects for high-stakes testing; analyzed in other ways, there was no discernible effect.

I happen to know personally most of the players in this drama. They are all serious scholars, careful quantitative analysts, and passionate educators. They reported evidence instead of anecdote or opinion. And they disagreed wildly. What’s a policymaker (or parent, for that matter) to do, especially when we are urged to engage in “evidence-based education”?

A similar conundrum emerged a couple of years ago when a research team from Harvard University, led by political science professor Paul E. Peterson, announced the results of a carefully designed experimental study concluding that school vouchers work to raise academic achievement for poor kids. (The high-stakes-testing studies were not experiments; they were post-hoc analyses of existing databases from the states.) The Harvard team’s claims were challenged by critics, including some of their own collaborators from the policy-research firm Mathematica. The folks from Mathematica cautioned that all we can conclude from this study is that vouchers worked positively for 6th grade African-American boys in New York City. In fact, only if the scores for all the kids in these studies are combined, including those of the African-American 6th graders, would there be a statistically significant benefit for the voucher group. A columnist in The Wall Street Journal attacked the critics, arguing that as long as there was an overall positive effect and no evidence that vouchers were harmful to anyone, it made sense to proceed with this policy initiative.

BRIC ARCHIVE

This Commentary was selected for inclusion in The Last Word: The Best Commentary and Controversy in American Education, published in 2007. Get more information on the book from the publisher.

Evidence is supposed to make life easier, or at least more rational, for policymakers in education. Instead of battling over ideologies, we are urged to conduct careful research, design real experiments whenever possible, collect data, and then dispassionately draw our conclusions. Would that the world were that simple. Truth is, research is all about exercising judgment under conditions of uncertainty, and even experimental designs don’t relieve us of those judgmental burdens. The acts of designing experiments themselves involve value judgments, and interpreting the results always demands careful judgment. As the late pioneer in educational psychology Lee Cronbach often observed, even a carefully designed experiment is ultimately a case study, conducted with particular teachers and students, in particular places at a particular time. And the analysis of all studies depends heavily on the analytic methods used, the level at which the data are aggregated and either combined or separated, and the interpretive powers and predilections of the scholars.

For the same reasons that jury members and Supreme Court justices often disagree with one another, and appeals courts often reverse the judgments of lower courts, evidence alone never tells the story. This is not a problem unique to education or the social sciences. Economists battle over whether lowering taxes stimulates the economy more than it increases deficits, and each side offers evidence. In medicine, cancer researchers give competing interpretations of studies on the efficacy of different kinds of mastectomies, and therefore of the value of alternative treatments. Surgeons disagree about the relative value of surgical vs. medical interventions for treatment of atherosclerosis. From global warming to diet and nutrition, scientists conduct studies, offer evidence, and disagree about practical or policy implications.

Does this mean that evidence is irrelevant and research is unnecessary? Does it mean that education policy cannot be based on careful research? Not at all. But we need to give up the fantasy that any single study will resolve major questions. We need to recognize that research evidence rarely speaks directly to the resolution of policy controversies without the necessary mediating agencies of human judgment, human values, and a community of scholars and actors prepared to deliberate and weigh alternatives in a world of uncertainty. Researchers in education (and in most other fields) are rarely neutral. Advocates cite evidence and research. Researchers themselves often are advocates. Indeed, it’s not very interesting for scholars to pursue studies of issues they don’t give a damn about.

So whose evidence should we believe? Let me propose a few preliminary guidelines for adjudicating the claims and counterclaims of conflicting studies.

First, I would live by the motto “Seek simplicity … and distrust it.” It is nearly unimaginable that any one study would support a simple policy conclusion, across the board. If a study claims to demonstrate that “bilingual education doesn’t work,” or that “all high-stakes testing is bad for kids,” or that “phonics is the only way to learn to read,” don’t trust the claim. Most studies of complex policy issues yield results that are themselves complex; they must be interpreted with caution and nuance. In the study of tuition vouchers, for example, the actual findings were highly variable in terms of effects on kids by race, grade, and location. Simple conclusions emerged only if we totally ignored all the variations and seriously oversimplified the findings.

It isn’t that simplicity is unachievable. The preponderance of the evidence on the value of holding back children who “fail” 1st grade appears to be both overwhelming and clear: Holding kids back is educationally worthless. But that’s a simple conclusion that comes from more than a decade of quite different studies, and, in particular circumstances involving particular kids, the best judgment may well be otherwise.

Second, I would give greater credence to any study that was conducted by either investigators who had no discernible stake in the results or—even better— those whose findings run counter to their own values, tastes, and preferences. As Judge David S. Tatel of the U.S. Court of Appeals for the District of Columbia Circuit observed last year, it is very difficult for the courts to take social-science research evidence seriously when it often appears that the scientists doing the research have a political or ideological stake in the desired results.

Research evidence rarely speaks directly to the resolution of policy controversies without the necessary mediating agencies of human judgment, human values, and a community of scholars and actors prepared to deliberate and weigh alternatives in a world of uncertainty.

If conflict of interest is a problem with pharmaceutical research, it is certainly an impediment with educational research as well. In some cases, investigators have a long and public record of advocating for one of the results they offer evidence to support. In other cases, their prior preferences are either unknown or unformed. As we typically do in qualitative studies, we should expect investigators to put their values, preferences, and commitments on the table when they offer their evidence and interpretations. It’s unrealistic to expect that every important study will be conducted by scholars who are disinterested in the findings. We need to go further to increase the credibility of evidence.

Third, I would insist that every major study with policy significance undergo serious peer review before its findings and the policy interpretations associated with them are trumpeted to the media. The review should deal with at least three aspects of the study. How well does the design and analysis permit the claims being made for the interpretation of the data? What other studies offer both complementary and contradictory findings, and how does this study compare with them? And perhaps most important, even if the findings of the research meet the strict canons of scholarly work in one’s discipline, how reasonable are the claims based on the evidence of this study to support the more general policy claims now being put forward?

Each of the three studies on high-stakes testing did undertake forms of peer review, at least with respect to a substantial chunk of the evidence they each presented. But peer review is not a universal process. Current modes of peer review for journals are unbearably slow. Therefore, we need a much swifter mechanism for such critical appraisals if this proposal is realistic. How can a serious form of review precede high-profile press releases and press conferences and yet not unacceptably impede dissemination?

Fourth, I would remind investigators that they have a social responsibility to act as “stewards” of their fields. They are responsible not only to zealously conduct their own studies and to organize the rhetoric to support their claims, but they also must, like lawyers, be “officers of the court” who bear responsibility for the fidelity of their work to the integrity of their field. They should so organize their studies that there is someone designated whose role and responsibility is to examine the procedures, data, and interpretations, and ask: “How might it be otherwise? How consistent with the findings is an interpretation opposite to that offered by the study directors?” In many European countries, all doctoral dissertations are defended publicly, with the participation of a formal “antagonist” whose job is to challenge the findings of the study.

A research study needs someone whose job it is to ask how susceptible the evidence and its interpretation are to intelligent (or just plain politically motivated) criticism. In fact, journalists have a professional obligation to be more critical in vetting stories of research before they publish them, to ask about peer review and about the questions raised by the research critics.

The bottom line is that we must move to a more evidence-based strategy for crafting our education policies, but we cannot pretend that there are some forms of research—even controlled experiments—that are guaranteed to provide answers to our questions without requiring the exercise of expert judgment and structured peer review. Evidence informs and enlightens decisionmaking; it does not bypass the need for interpretation and judgment. It’s unrealistic to expect that educational research will regularly be conducted by those who have absolutely no stake in the outcomes. Education is not, and never will be, a values-free zone. Nevertheless, we need ways to review research findings, evaluate the evidence, consider the values inherent in the situation, and render judgments that our citizenry can trust.

Beyond these proposals, I would recommend the formation of a new policy forum to assist in regularly reviewing and evaluating policy-relevant educational research. In some areas, we may need the equivalent of research-review SWAT teams that can be called in on a regular basis to review competing claims and the evidence that supports them. In other cases, the use of “consensus panels” can be quite useful in the face of complex, multiple studies with a range of findings, interpretations, and policy recommendations, though the pace of their efforts can be snail-like.

The National Research Council of the National Academies might well take the lead in such an activity, assisted by a range of both self-consciously partisan and intentionally nonpartisan bodies. Such forums would organize quick-response review panels and also conduct periodic reviews when serious policy controversies arise. The forum should be nongovernmental, to avoid conflicts of interest with the education policy missions of any federal, state, or local government. (The current swirl of controversy around the Bush administration’s implementation of the No Child Left Behind program exemplifies this problem.)

If we can follow those guidelines, there will remain big, unanswered questions about the impact of high-stakes testing on the achievement of elementary school kids, and about the value of vouchers to reduce educational inequality. But we will have much more confidence in the value of the evidence put forward to help us traverse the thickets of education policy. I can assure you, however, that the picture that emerges from the evidence won’t likely be simple. That’s not necessarily a problem with the quality of the research; it may simply be a characteristic of the world in which we live.

Related Tags:

Events

Teaching Profession K-12 Essentials Forum Supporting the New K-12 Workforce: What Teachers Need to Stay at School
 Join this free virtual event to discover what teachers say they need to feel supported to stay in classrooms for the long haul.
College & Workforce Readiness K-12 Essentials Forum Career and Technical Education Takes Its Next Big Step
Join this free virtual event to hear creative approaches to modernize CTE programs and navigate the shift away from a near-exclusive focus on "college preparedness."

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

School & District Management Letter to the Editor ‘We Are Very Engaged in Our Work,’ Says Superintendent
A district leader adds more context to what it's like working in his profession.
1 min read
Education Week opinion letters submissions
Gwen Keraval for Education Week
School & District Management How School Board Members Really Feel About Political Conflict
Political tensions remain high for many school boards across the country, new survey data show.
3 min read
Members of the school board sit on stage in the school auditorium to respond to questions from residents during the annual Town Meeting, on March 5, 2024, in Stowe, Vt. Town Meeting is a tradition that, in Vermont, dates back more than 250 years, to before the founding of the republic. But it is under threat. Many people feel they no longer have the time or ability to attend such meetings. Last year, residents of neighboring Morristown voted to switch to a secret ballot system, ending their town meeting tradition.
Members of the school board sit on stage in the school auditorium to respond to questions from residents during the annual Town Meeting, on March 5, 2024, in Stowe, Vt. A new survey suggests that political conflict that rose during the pandemic has remained relatively high for many school boards across the country.
Robert F. Bukaty/AP
School & District Management LAUSD Taps Interim Chief as Superintendent 3 Days After Carvalho's Resignation
Andres Chait has served as a teacher, principal, and regional superintendent in Los Angeles.
Howard Blume, Los Angeles Times
6 min read
Acting Superintendent Andres Chait at a Los Angeles Unified School District Board meeting in Los Angeles on June 23, 2026 .
Acting Superintendent Andres Chait at a Los Angeles Unified School District Board meeting in Los Angeles on June 23, 2026. LAUSD has named Chait its new superintendent on a permanent basis following Alberto Carvalho's resignation earlier this week.
Myung J. Chun/Los Angeles Times via TNS
School & District Management Lessons Learned About Bold Tech Initiatives From the LAUSD Chief's Departure
Bold initiatives can cut both ways, says a leadership expert, sparking achievement gains or falling apart.
20260622 AMX US NEWS WHAT ALBERTO CARVALHOS RESIGNATION MEANS 1 LD
Alberto Carvalho, then the Los Angeles Unified School District superintendent, listens to parents of students at a Los Angeles high school on March 30, 2022. Carvalho resigned from his position Sunday night under the cloud of a failed AI chatbot initiative and an FBI investigation.
Photo by David Crane, Los Angeles Daily News/SCNG