The Limited Value of 'What Works' Research
Ever since educational research became an academic discipline more than a century ago, researchers and educators have been vocal in their dissatisfaction over its impact on practice. For decades, education research has been criticized as confusing, irrelevant, and of little practical use, fueling a pessimistic view that research probably will not lead to better schools.
In response, the federal government and the research community have zeroed in on so-called “what works” research, and, in recent years, studies have mushroomed to answer a broad range of policy questions, such as: Do school vouchers work? Does technology improve student learning? Are private schools better than public schools? At the same time, existing studies on intervention strategies and programs are scrutinized to provide educators, policymakers, and the public with trusted and easy-to-understand scientific evidence of effective programming. The federal What Works Clearinghouse is a premier example of such endeavors.
This is all well and good, but we would argue that it is far from enough. We believe it is time for a research shift, and instead of making determinations about whether programs work or not, attention should turn to identifying the right students for whom a program is effective and the necessary contextual conditions that make a program work. What’s more, local schools should conduct rigorous studies to determine whether programs and initiatives will work for their students—a key to sorting out relevance and applicability.
Undoubtedly, “what works” efforts deserve much applause and appreciation, but, unfortunately, they hold only limited value for educators, for two reasons. First, while researchers are pursuing what works in general, what matters to practitioners is what works in their particular setting. Educators know that a program deemed effective by researchers will not necessarily work or may have a rather different impact in their own schools.
Researchers for their part are also keenly aware of this discrepancy from the inconsistent findings that have plagued educational research almost since its inception. To reach a general conclusion about whether an intervention, instructional strategy, or a policy works, researchers usually either conduct large-scale studies that include a great number of participants from a wide range of settings or review many rigorous studies conducted in different contexts. By doing this, researchers can more reliably generalize their findings to make them widely applicable. Thus, despite the varied effects in different settings or inconsistent findings from different studies, a single number is produced to represent an average effect. Because of the simplicity of this approach, such “average effect” findings can easily attract media attention and become headlines in professional magazines and newspapers for educators. They are often recommended, through various channels, to practitioners as being scientifically proven and readily applicable.
However, what is lost in the translation is that the average effects are simplified representations of varied, and many times even contradictory, results. Knowing that an intervention works (or not) only tells us that the overall average effect of the intervention in all schools where it is implemented is expected to be positive (or zero). It does not tell educators in a school district whether the intervention will work (or not) or how well it will work in their schools or classrooms.
There is an irony in what we think educational research can do to improve practice and how we actually go about it. On the one hand, there is a consensus that one size does not fit all in education, and there has been a call for educational research to study what works with whom and under what conditions. On the other hand, we have been pushing research to show that intervention programs can produce “scalable” results, which, in plain English, means that their effects can be replicated in many different settings. Although never explicitly spelled out, the full name of the “what works” we have been pursuing is really “what works for anybody in any context and under any condition.”
The second problem of the “what works” research is that it says little about how an intervention should be implemented once it is found to be effective. From both common sense and experience, educators know that one program rarely works for everybody. By the same token, it is highly unlikely that a program will work for nobody. Placing the right students in an intervention program is critical. The what-works research is often vague, however, about the target population of intervention programs. For example, in the intervention programs reviewed by the What Works Clearinghouse, terms such as “at risk” and “below proficient level” are frequently used to describe the target students. But “at risk” is often defined differently in different school districts and even among schools in the same district. There is also plenty of evidence that state tests vary greatly in difficulty. Given the discrepancies, how can educators be confident that they are serving the right students simply by following these loosely defined program-placement guidelines?
To produce unequivocal answers that can be directly applied by educators, the what-works research has been obsessed with the overall average effect and paid little attention to variation in impact. However, why a program works in one district, but not in the other, or why the effect is greater in one setting than in another is extremely relevant to educators. Unraveling the reasons behind the varied results can help educators avoid pitfalls and model successful implementation.
And, while it is important to tell educators what works, it is equally—maybe more—important to inform them about how to make a program work. When large-scale studies with diversified participants are reviewed, much can be learned about what works for whom and under what conditions. Unfortunately, that is not the focus of the “what works” research that has been produced.
In education, we tend to do the same thing over and over. What-works research is no exception. In 1986, the U.S. Department of Education published its then-famous pamphlet “What Works: Research About Teaching and Learning,” in which past research was critically reviewed and distilled into 41 significant findings or conclusions intended to help parents and educators adopt the most effective practices. Widely heralded, a half-million copies were distributed. A year later, Chester E. Finn Jr., then the department’s assistant secretary for educational research and improvement and the chief author of the pamphlet, asked 18 high school principals about the pamphlet. To his dismay, four had heard of it, two had seen it, and one of those two had discussed it at a faculty meeting. Twenty years later, we are doing essentially the same thing with no assurance that word is spreading any more effectively,
In a sense, producing the what-works research and expecting practitioners to apply the findings is analogous to prescribing drugs regardless of a patient’s medical history and without giving instructions on when to take the medications and how. No one believes this is good medical practice, and there is probably no reason to pursue similar measures in education. We are not suggesting that the what-works research is useless and should be discontinued. But we do believe the focus should be something different.
When quantitative data have become the de facto way of assessing education, we can always come up with an average number regarding a program’s or policy’s effectiveness. The problem is that that single number, no matter how scientifically and painstakingly derived, has little value for practitioners. It’s time for a change in defining what works.
Vol. 30, Issue 28