In launching an unprecedented effort to improve school achievement and other youth outcomes by “scaling up” evidence-based programs, the Obama administration has given education a golden opportunity on the research front. With new funding, the administration has shown its commitment to supporting “what works.” These scale-up initiatives—including the White House’s Social Innovation Fund, or SIF, the U.S. Department of Health and Human Services’ efforts on teen-pregnancy prevention and home visitation, and the U.S. Department of Education’s Investing in Innovation Fund, or i3,—present a real opportunity for us to learn to do this work better and make lasting impacts on the lives of young people.
The i3 Fund is likely the effort most familiar to educators. Among the 49 successful i3 applicants are Success for All, KIPP’s Effective Leadership Development Model, Teach For America, and Reading Recovery, each of which will try to reproduce its effects in new communities, schools, and classrooms. This work faces a serious challenge, however. Research and prior scale-up efforts have shown that programs that are effective at small scale (perhaps because they were implemented in favorable circumstances by the original developer) have trouble maintaining that effectiveness when extended more broadly. We see that the expanded programs make a difference in some locations, but not others, and with some youths, but not enough.
So, how can we scale up good programs effectively? What leads to achieving robust implementation, reaching young people who need the services, and recognizing sites that can support innovation? More specifically, what are the right strategies for expanding such programs, what types of organizations can effectively implement them, and how do local policies and other conditions influence their effectiveness? When is the introduction of a new program an improvement on the status quo? Prior research gives policymakers and practitioners almost no guidance on these important issues, so they have to rely on their past experiences and hope their new work produces positive results at scale.
We need more than practitioner wisdom to improve the success rate of such initiatives—we need to learn from strong data as we go. The good news is that these initiatives allow us the chance to examine with whom and under what conditions programs can serve as the missing ingredients in the “what works” agenda.
Translating research into good policy and practice often requires a leap of faith. Now, we have the opportunity to make sure we land on our feet.
To learn more about what influences program effectiveness, we need three elements: (1) reliable estimates of a program’s impact in a large number of different sites, (2) good measures of the background characteristics of the participants, and (3) data on the conditions within and outside a program that might influence results.
The scale-up initiatives can meet the first criteria, at least potentially. Particularly in i3, proposals that received the biggest grants had to include strong impact evaluations of their expanded efforts. While these awards are grants and not contracts (funders have more control under a contract), funding agencies need to strongly encourage the winners to deliver on their promised impact evaluations. If such rigorous evaluations are done, we will know how much of a difference these innovations make in each new site. So, if the historical problem of varying effectiveness repeats itself, we will be poised to understand why.
For the second element, we can leverage the promised impact evaluations. For example, one finding common to evaluations is that programs are more effective for some subgroups of a target population than for others. Many educational interventions have little or no impact on schools, teachers, or students who do not really need the intervention, or on those who need more than the intervention can deliver. If the Department of Education asks the various local evaluators to gather a common set of baseline data for students involved in the i3 expansions, uniformly capturing information such as age for grade, English-language-learner status, and prior achievement, we could look across sites and groups of similar innovations for patterns. Maybe certain i3 strategies will be most effective with particular students. If we don’t gather common data on the students across sites and grants, we will never find out.
Similarly, for the third item, most developers believe that the innovations will make the most difference when implemented in communities or schools that have a commitment to a new effort, the human and financial resources to put an innovation in place, and few comparable programs already. However, this practical wisdom has not been confirmed by strong data. We should ask the local evaluators to gather uniform information on these factors across the sites, since variation is highly likely. Then, we can see if practical wisdom is borne out in results.
Gathering good data from these scale-up initiatives can also tell us if we’re thinking about past evidence in sensible ways. Is it as important as we think? Does it predict results under some circumstances, but not others? Using prior evidence as a condition of funding has led to concerns that some well-funded and well-evaluated innovations might stifle support for other deserving programs that do not yet have evidence of their positive impact. There is considerable variation in the evidence base for the i3 winners, and the same will likely be true in the other initiatives. The extent of prior evidence can be codified by a group such as the federal What Works Clearinghouse. With consistent data on how much evidence exists about the past performance of each winner, we can determine whether our current standards for evidence predict future effectiveness.
Translating research into good policy and practice often requires a leap of faith. Now, we have the opportunity to make sure we land on our feet. Thanks to rigorous evaluations of the effects of social programs, we know that they are sometimes effective and sometimes not. We need to use the scale-up initiatives to help us learn why. These recommendations will provide an understanding of the characteristics of youths, settings, and resources that predict effects. Knowing all this will help policymakers and practitioners target and support effective programs. In the longer term, it will help us improve outcomes for all young people.
A version of this article appeared in the November 17, 2010 edition of Education Week as Learning From Scale-Up Initiatives