In education research, there’s a drive to cut to the chase: What’s the effect on the classroom? How much better will students perform on the state math test using this curriculum? How many months of classroom time can students progress by using that tutoring system? Usually education watchers make that interpretation based on a study’s effect size, often called the p-value. Yet at the annual conference of the Association for Psychological Science here this weekend, statistics professor Geoff Cumming of Latrobe University in Melbourne, Australia made a thoughtful—and pretty persuasive—argument that the effect size is not the certainty it is often taken for, and considering more complexity can give us a more accurate view of how interventions really work.
A p-value in statistics represents the likelihood that the researcher would be able to get the same result by chance, and generally the smaller it is, the better: a p-value of .05 or less is usually needed for the intervention to be considered to have at least a small effect. As Cumming explains in this demonstration, the effects of a given experiment can vary significantly with every repetition, and the “official” effect size doesn’t usually show that uncertainty.
That’s why the latest research manual of the American Psychological Association calls for researchers to use estimation, such as confidence intervals, rather than simply relying on effect size. For example, an estimated result might show that researchers are 95 percent sure that a reading program increased the end-of-course test scores of participating middle school students by 12 points, give or take 4 points.
In a 2010 study, Cumming and his colleagues asked 330 psychologists, behavioral neuroscientists and medical researchers to interpret the findings of two fake studies with similar results, one of which reported effect size and the other confidence intervals. Cumming found researchers from all three fields were more likely to correctly interpret that the two studies had similar findings if they focused on the confidence intervals, while 60 percent of those who focused on effect sizes seemed to be tripped up by the fact that the effects of one study were considered “significant” and the other not.
Cumming also called for more focus on meta-analyses, in which researchers take a holistic view of the results of many studies over time; the sheer number and variety of samples in these analyses can help compensate for the chance of error in any one study.
For educators, policymakers and other research watchers, it’s important to understand, when you look at any study, just how much variation there can be even in significant results.
A version of this news article first appeared in the Inside School Research blog.