What Does It Mean to Call a Program 'Evidence-Based' Anyway? (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Fiona Hollands, Yuan Chang, & Venita Holmes

Fiona Hollands is a senior researcher in the Department of Education Policy & Social Analysis at Teachers College, Columbia University. Yuan Chang is a research assistant in the same department at Teachers College. Venita Holmes is a manager in the Houston Independent School District, Department of Research and Accountability.

Millions of federal dollars are currently streaming into schools to help address learning shortfalls related to the pandemic. While it’s not clear what strings might be attached to these new funds, recent versions of the Elementary and Secondary Education Act have encouraged and, for some uses, required education decisionmakers to invest federal funds in evidence-based activities. Many school leaders themselves want evidence to inform their choices. But how does that work in practice?

The current federal approach may seem either needlessly restrictive or absolutely sound, but the fact is the bar for what counts as evidence isn’t awfully high: The Every Student Succeeds Act, the latest ESEA iteration, asks for only one study showing that a program improves student outcomes. And if there isn’t one, a sound rationale for the activity and a plan to evaluate it will often suffice.

To figure out how well schools and districts are able to meet this directive, our research team worked with a large, Southern school system to determine what proportion of its Title I funds are spent on evidence-based activities. Our findings—which relate to prepandemic federal funds—reveal weaknesses both in the law and in the evidence base. As a result, we have some advice to share with educators who are doing their best to adhere to ESSA, serve students well, and now wisely spend large amounts of COVID-19 relief funds in record time.

First, let’s address the question of what should count as adequate evidence when choosing where to invest education funds. Does it really make sense to declare a program “evidence-based” if one study shows positive outcomes for students while a bunch of others don’t?

In our analysis, we first determined what percentage of Title I funds at the district office and four sampled schools were spent on programs and practices backed by just one study: 70 percent in a three-year period. It’s likely that this number would have been substantially higher if we had descended below the top three tiers of evidence laid out in ESSA to that nebulous world of activities for which there is a research-based rationale and an ongoing evaluation plan. To evade this squishiness, we limited our definition of positive evidence to studies using one of three rigorous approaches for evaluating effectiveness: experimental, quasi-experimental, or correlational methods. Additionally, each study needed to show gains for students of approximately the same age as those being served at the district.

Next, we considered all the relevant studies we could find in four research repositories: What Works Clearinghouse, Evidence for ESSA, Education Endowment Foundation, and Education Resources Information Center (ERIC). We agreed on a summary rating for each program or practice to reflect the overall body of evidence. Using these ratings, we found that only 50 percent of the funds were spent on evidence-based activities. The punch line here is that, for 20 percent of the Title I investments, the conclusions drawn from a single study run counter to conclusions drawn from multiple studies of the same intervention.

The takeaway for those of you holding the purse strings is don’t stop at the first positive study you find.

Ironically, the more often a program is studied, the less likely it is always found to be effective. This means that ESSA can inadvertently tip the scales toward less studied—but not necessarily more promising—interventions. So a second lesson is when the evidence is mixed, look carefully at what kinds of students were served in each study, which outcomes were improved, and what skills and resources were needed to implement the program. If these don’t match your own context in at least some studies that found student improvements, be cautious about the likelihood that you can replicate positive results.

There’s also the problem that, for some applications of Title I funds suggested by ESSA, there just isn’t any evidence at all. This means that expecting 100 percent of Title I funds to be invested in evidence-based activities is an unreasonable goal. There are no rigorous studies, for instance, of many widely implemented and seemingly essential programs like student-data tracking and school nurses.

Most important, it’s worth questioning the apparent assumption that educators have the time and skills to search for and evaluate all this evidence. It took us a year to complete our study even though we’ve been trained extensively in how to do this kind of work. Who has that kind of time in a school or district office? It’s unrealistic and inefficient to expect educators and administrators in every school and district to thoroughly investigate each new practice they consider. It would be more helpful for state education departments and district central offices to provide clearer guidance to the decisionmakers they support on suitable investments of federal funds.

Some education agencies, for example, the New York state education department, are already doing this. Right now, suggestions on how to address learning loss are desperately needed—and these need to be more specific than “high-quality tutoring.” Detailed exemplars are called for to show what that looks like in practice to address different student needs.

Our research leads to one more major piece of advice. Even a well-curated list of evidence-based activities cannot assure effective implementation of promising practices. Schools may be more successful at scaling up effective programs they are already using or at tweaking implementation of their underperforming programs than at introducing brand-new strategies, even if the evidence for them is favorable. For example, one district was surprised to find that Reading Recovery was not producing positive results overall for its students despite ample research evidence to support its effectiveness elsewhere. Instead of ditching the program, district officials carefully reviewed whether schools with the greatest number of struggling readers were getting the appropriate share of trained teachers.

Our own research team’s evidence repository lists items currently in use in districts with which we work and aims to help decisionmakers know where to steer their efforts and their funding. It may behoove states and districts to exert their energy in supporting educators to better implement potentially effective programs than expecting them to become researchers overnight.

A version of this article appeared in the June 16, 2021 edition of Education Week as What Does It Mean To Call a Program ‘Evidence-Based’ Anyway?