For the first time since its inception, the U.S. Department of Education’s What Works Clearinghouse is broadening its definition of “gold standard” research to include nonexperimental designs.
As part of the Institute of Education Sciences’ push to make research more relevant to educators, the clearinghouse has devised standards by which it can consider two new methods for rigorous research. In addition to the randomized, controlled trials now considered the gold standard, education researchers are now able to use regression-discontinuity, a method that uses a cutoff point rather than random assignment to form comparison groups, and single-case studies in certain situations. In clearinghouse reviews, studies that adhere to the new standards are now considered, along with randomized, controlled trials, to result in the “strongest evidence” to guide educators and policymakers.
The standards, which have been in development for more than three years, were posted on the clearinghouse website over the summer and are already being used to evaluate incoming research, according to Rebecca A. Maynard, the IES commissioner for the National Center for Education, Evaluation and Regional Assistance. However, the clearinghouse plans to tweak the standards further this fall and has asked researchers to weigh in on their usability. The clearinghouse plans to produce reports of research that meets the new quality standards, but Ms. Maynard said it may be several months before the first ones come out because there will be an extra layer of review for those reports.
“We want to make sure those standards seem to have been functionally easy to apply and that we’ve been able to apply them fairly and report the evidence in a meaningful way,” she said.
The institute, which serves as the research arm for the Education Department, established the clearinghouse in 2002 in order to showcase high-quality research for educators and policymakers. Yet its rigorous criteria for inclusion, which focus primarily on randomized, controlled trials and quasi-experimental studies, at first found relatively few studies that met the bar. Of those that met the criteria, so few showed strong positive effects that the site was given the moniker the “Nothing Works Clearinghouse.”
Critics have eased up on the nickname, however, as more studies with positive effects came out and the clearinghouse moved to include a broader array of research. For example, it produces guides showing the best practices culled from all research in a given instruction area, such as English-language learners. It also conducts quick reviews of the quality of individual studies on hot topics, such as charter schools.
“I think the What Works Clearinghouse in a way helps shape the marketplace of research; the standards that are contained in the What Works Clearinghouse then shape the way people will go about doing work,” said James W. Kohlmoos, the president of the Knowledge Alliance, a Washington-based group that represents research organizations. “Just the fact that they’re looking at it, to me, suggests they are eager to make the What Works Clearinghouse more usable and more relevant to what the end user needs and wants, and, to me, that’s a positive thing.”
To meet the highest evidence bar previously, researchers generally had to conduct a randomized, controlled trial, in which subjects are randomly assigned to receive an intervention or not, with other factors as closely matched as possible. If the students in the two groups are otherwise the same, the program or intervention being studied can be presumed to cause changes seen between the two groups. Charter schools have become hot topics for experimental design in part because the lotteries used to assign students to overenrolled schools create a natural experiment situation for researchers.
Single-case design studies are much more specialized. They are experiments that involve a single subject, be it a student, classroom, or school district. The researcher repeatedly measures an outcome in response to an intervention under many different conditions. In this case, the researcher sets a baseline measurement before implementing the intervention, so the subject becomes its own control group.
Single-case studies are frequently used in special education research, particularly for less common disorders such as autism, according to Scott Cody, a deputy director of the clearinghouse and the associate director of human-services research at the Princeton, N.J.-based Mathematica Policy Research Inc., which operates the clearinghouse.
“Each single case study is not designed for generalization,” Mr. Cody said, but, “once you’ve got a critical mass of single-case studies, you can start to generalize.”
For the clearinghouse to publish single-case study results, they have to come from at least five studies by three or more research teams in three or more geographic locations and cover at least 20 cases, Mr. Cody said. The clearinghouse is conducting a pilot study this fall to gauge the size of effects that can be expected from those studies.
The clearinghouse also decided to write standards for regression-discontinuity studies “because it’s becoming very popular, ... so it seemed important to take a hard look,” Ms. Maynard said. “We’re getting out a little bit in front and hoping we’re able to provide guidance.”
Like randomized, controlled trials, regression-discontinuity studies gauge whether a program or intervention causes certain effects—for example, whether a phonics tutoring program increases students’ performance on a reading test. Rather than randomly assigning similar students to receive the intervention or not, a regression-discontinuity study compares students on either side of an objective cutoff point used to assign students to one group or the other. The tutoring program, say, may only take students who have scored lower than 75 percent on a phonics test. The students who scored 74 percent, barely qualifying for the tutoring, and those who scored 76 percent, just out of the pool, start from a nearly identical point, statistically. Researchers then compare the outcomes of the two groups.
“Here you know exactly who got the treatment and why, and you know that any other differences between the groups are because of the treatment,” said Jill Constantine, a clearinghouse deputy director and an associate research director at Mathematica.
Regression discontinuity can be more palatable to parents than randomized trials, since researchers don’t have to prevent students from receiving an intervention solely for research purposes. Ms. Maynard said researchers also see such studies as “easier, quicker and cheaper; people are defaulting to that instead of randomized trials.”
A Broader Base
Jon Baron, the president of the Washington-based Coalition for Evidence-Based Policy and the vice chairman of the National Board for Education Sciences, which advises the institute on clearinghouse operations, voiced concern that the clearinghouse may be jumping the gun in declaring that regression-discontinuity or single-case studies can provide results as reliable as randomized trials do. He noted that there have been few attempts to replicate the results of randomized trials using other methods, and those that have been done “show consistently [that] many of the widely used quasi-experimental methods, [including regression discontinuity], don’t do well in replicating the results of a really good randomized, control trial.”
“Those nonrandomized studies are very important, but it’s when the question comes to, is there definitive evidence that a program works, … the answer is generally you can only get to that level of evidence with well-run [randomized, controlled trials] carried out in real-world settings.”
By contrast, Miguel Urquiola, an associate economics professor at Columbia University who has previously critiqued the quality of regression-discontinuity studies, said the new standards do “a good job of simplifying some criteria that could help researchers understand when there are problems. … Under these conditions, RD can give answers that are as credible as an experiment.”
Others, including Eric A. Hanushek, the chairman of the National Board for Education Sciences and an economist at Stanford University, see the new methods as a natural broadening of the clearinghouse’s research base. Mr. Hanushek and Mr. Urquiola both predicted the new standards will improve the quality of regression-discontinuity and single-case studies and encourage researchers to explore new questions using the methods.
“People now accept that rigorous methods can be applied to education problems, that scientific methods can be applied to education and should be,” Mr. Hanushek said.
Moreover, Ms. Maynard was quick to point out that the new methods will not replace experimental designs in IES’s research quiver, as techniques for experimental designs continue to evolve. “Most of the arguments against using [randomized, controlled trials] are a function of not thinking broadly about how you design an experiment,” she said.
Mr. Urquiola agreed. “It opens the field to certain types of questions that [regression-discontinuity] experiments aren’t set to deal with. In education, there are so many questions that one approach can lead to things being quite limited.
“Wayne J. Camara, the president of the National Council on Measurement in Education, said researchers have been waiting for the clearinghouse to broaden its scope in this way for many years. He considers the new standards a good first step, but both he and Mr. Kohlmoos said they would like to see standards to signify high-quality research conducted in many different methods, from experimental to descriptive.
“Only when you see repeated evidence that looks at context and uses different methodologies and perspectives can you … really understand whether interventions will really hold up against the test of time,” Mr. Camara said.
A version of this article appeared in the October 20, 2010 edition of Education Week as Federal Criteria for Studies Grow