The What Works Clearinghouse is the flagship initiative of the federal Institute of Education Sciences. Begun in 2002, it was intended, according to its Web site, to “help the education community locate and recognize credible and reliable evidence to make informed decisions,” and to provide educators with a “central and trusted source of scientific evidence of what works in education.” Scientifically valid and clearly written reviews of research on practical programs are essential to evidence-based reform, so the clearinghouse has been eagerly awaited by all who believe that educational practice should emphasize programs with strong evidence of effectiveness.
After five years and more than $30 million, the clearinghouse has finally begun to produce significant numbers of reports on the evidence base supporting various educational programs. But the reports make it clear that the clearinghouse has failed. Its arcane and poorly justified procedures have produced information that is neither scientifically justified nor useful to educators.
Recently, the What Works contract was awarded to a new contractor, Mathematica Policy Research Inc. This transfer provides an opportunity for a fresh start—one that is needed if the clearinghouse is to accomplish its worthy goals.
What is wrong with the What Works Clearinghouse is that although its rules appropriately emphasize random assignment, they ignore design elements with far more potential for bias than lack of random assignment. As a result, the clearinghouse gives its highest ratings for evidence of positive effects to programs supported by studies that are often very small, very brief, very biased, and/or very seriously flawed in other ways, failing to give educators valid or meaningful information on the programs they might use to improve their students’ achievement.
One example is in the middle school mathematics topic area. The clearinghouse gave its top rating, “positive effects,” to only one program, Saxon Math. Two randomized and four matched studies met clearinghouse standards.
To get into the top category, a program must have significant effects in at least one randomized study and one other study. The unpublished randomized study that qualified Saxon Math for the “positive effects” rating involved 46 students taught by one teacher in one high school. The only outcome measure was made up by the author, and is closely aligned with the Saxon Math curriculum (but not the curriculum used in the control group). The other small randomized study found no differences, and two of the four studies that used conventional measures of math not keyed to the Saxon Math curriculum found effects favoring the control group. The median effect size across the four studies that used conventional measures was only +0.06 (most researchers consider an effect size, the proportion of a standard deviation separating experimental and control groups, to be educationally meaningful if it is +0.20 or larger).
Another egregious example relates to a program called DaisyQuest, computer software designed to teach phonemic awareness in grades K-1, listed in the clearinghouse as having “positive effects” on “alphabetics.” The DaisyQuest studies involved about five hours of computer instruction. Sample sizes were extremely small: 49 in one study, 27 in another, 69 in a third. Worse, outcome measures included activities taken from the DaisyQuest program (which experimental students had practiced and control students had never seen). In fact, control students were not being taught phonemic awareness at all. In one of the studies, ignored in the ratings, a comparison treatment was used in which a teacher taught phonemic awareness to a group of children, and those children scored far better on the Phonological Awareness Test than those who experienced DaisyQuest (effect size = -0.44).
Studies like those of DaisyQuest are the rule, not the exception, among programs rated “positive” in the clearinghouse’s beginning-reading topic report. Other programs rated “positive” included Kaplan SpellRead, a tutoring program validated as having “positive effects” in alphabetics by an eight-week study in a single school in Newfoundland involving 47 children. Another tutoring program, Stepping Stones to Literacy, was rated “positive” for alphabetics based on a five-week study involving just 36 children, in which tutoring was delivered by project staff members.
A four-week study of an ill-defined intervention called Peer Tutoring and Response Groups evaluated a process-writing model in which 4th graders in the experimental group worked in small groups to plan, draft, edit, and finalize compositions. The small groups were composed of English-language learners and fully proficient English-speakers. On the final test used as the outcome measure, children were asked to write a composition. In the experimental group, children were allowed to help each other write their compositions, while children in the control group wrote by themselves. Not surprisingly, experimental ELL students wrote significantly more words in their compositions, with the help of their English-proficient groupmates. This outcome qualified Peer Tutoring and Response Groups for a “positive effects” rating for “English-language development” in the English-language-learning topic report.
Program ratings should not be strongly influenced by one or two small studies, but should emphasize programs evaluated with many students in many schools.
The clearinghouse is unaccountably inconsistent from topic to topic. It requires that studies have a duration of at least a semester in math, but there is no duration requirement in reading or in programs for English-language learners. This means that if DaisyQuest, Stepping Stones to Literacy, SpellRead, or Peer Tutoring and Response Groups had been math programs, their key studies would have been excluded. The middle school math review considers only textbook programs, ignoring computer-assisted instruction and programs that focus on changing instructional processes (such as cooperative learning). Elementary math includes computer-assisted instruction, but not instructional-process programs. Other reviews of math programs have found that the instructional-process programs excluded by the clearinghouse have the strongest positive effects in the most rigorous evaluations, yet a reader of the What Works Clearinghouse would never know this.
The current clearinghouse will not pass muster among the scientific community, among educators, or among policymakers. Its Web site should be taken down immediately while new procedures are devised. The new procedures should refocus on giving educators unbiased information on the likely outcomes of programs available to them today to accomplish their most important goals, as outlined in the What Works Clearinghouse’s mission statement. To this end, the procedures should place strict standards on outcome measures (to remove those biased toward the experimental group) and on duration (requiring at least 12 weeks of intervention), to ensure that highlighted studies are meaningful and fair. To give appropriate emphasis to large, unbiased studies, the clearinghouse should compromise on statistical adjustments for clustering and other statistical issues that do not introduce bias. Program ratings should not be strongly influenced by one or two small studies, but should emphasize programs evaluated with many students in many schools.
Some day, the What Works Clearinghouse could become a reliable, respected source of information regularly consulted by educators and policymakers as they make critical decisions for children. As it currently stands, however, the clearinghouse is counterproductive, communicating to educators, policymakers, and researchers that research to date has little to offer them in choosing proven or promising programs. Mathematica has an opportunity to make a fresh start with a new clearinghouse that truly represents the accumulated findings of high-quality research.
Educators and policymakers have been promised fair, meaningful, and useful information they can rely on to make wise decisions for children. With a fresh start, the What Works Clearinghouse can still fulfill this promise.
A version of this article appeared in the December 19, 2007 edition of Education Week as The What Works Clearinghouse: Time for a Fresh Start