As per usual, the trolls at the New York Times have been following the conversations here at EdTechResearcher, and now they’re piggybacking on our work with an article on education research, randomized experiments, and the What Works Clearinghouse.
To review, last week, I got a little frumpy at two headlines summarizing results from a Department of Education/RAND study of a blended learning learning platform: Cognitive Tutor: Alegbra. One headline said “Cognitive Tutor: Algebra Doubles Student Learning,” and I wrote a post that said, “sort of, but not always, and high schools may find lower performance in year 1 before gains in year 2, and we’re not totally sure for middle school.”
Then, something lovely happened: a whole bunch of people started a civil discussion about the research. Christina Quattrocchi who wrote one of the headlines that I quibbled with wrote an article further investigating the findings and interpretation of the study. John Pane, the study’s lead author and I exchanged some email correspondence where he agreed with some of my interpretations and suggested some alternative perspectives on others. A colleague of Betsy Corcoran, the founder of EdSurge, invited me to join them on a panel discussion (which I couldn’t do, but I hope to in the future). Steve Ritter, founder of Carnegie Learning, weighed in with his perspectives in a comment here.
The conversations pushed my thinking. Steve Ritter noted in his comment that I characterized the study as four smaller studies—two year-long studies in high schools and two year-long studies in middle school—and he argued that they are best understood as two studies of the same high schools over two years and the same middle schools over two years. Steve’s point is quite valid; the fact that the study follows the same schools for two years is very important (I was highlighting something different, the fact that we have four sets of results to examine, which is also important). Steve and I both discussed the fact that there wasn’t quantitative, empirical evidence that proved that the improvement in high school outcomes from year 1 to year 2 was due to teacher improvement or practice changes. But to that point, John Pane notes in our email correspondence that the sample size of teachers was much smaller than the sample size of students, and therefore the study didn’t necessarily have the statistical power to detect those changes. He noted that other studies have identified the phenomena of an “innovator’s dip,” and there aren’t really very many compelling alternative explanations for why scores would be low in year 1 and higher in year 2. Even though we don’t have great quantitative evidence from this study, it’s not crazy to think that the gains between year 1 and year 2 were because teachers got better. I think John has a good point, here, if not a decisive one. (RAND has policies limiting publishing John’s remarks until the full study is published in a journal, which is reasonable. Otherwise, I’d publish our correspondence). If I re-wrote that blog post today, or made it longer, I think these would be worthy perspectives to dig more into.
Let me highlight one important thing about these conversations: we all agree on the same body of facts. The study, as far as I can tell, was very carefully done. Steve, John, and I are all turning to the same body of findings, the same statistical analysis, and using those findings as the shared text for a conversation. We then seem to have some disagreements (along with many areas of near-universal concurrence) about how to interpret those findings, how to communicate those findings, and how to help educational decision makers leverage those findings in their own contexts. The research absolutely helps advance our understanding of blended learning in mathematics. It informs conversations about what we should do. It doesn’t provide a straightforward answer for everyone.
When I was writing my first grant application as a doctoral student, my advisor, the economist of education Richard Murnane, changed some wording in a way that I now consider quite illuminating. I wrote that we were going to “answer these three research questions about excellence and equity.” Dick changed it to read that we would “address these three research questions.” In most cases, research addresses questions rather than answers them.
It’s that nuance that is missing from the recent New York Times piece on educational research. The article suggests that experimental research can remove the guess-work from education. But just like doctors still have to make informed judgments (“guesses”) rather than simply having computers automatically recommend treatments based on symptoms or vital signs, teachers and educators will still need to make informed judgments (“guesses”) using a wide variety of educational research. Research narrows the bounds of our debates, and it narrows the range of informed judgments. It addresses, rather than answers.
To some extent, public understanding of the affordances of research is set back by branding from the Institute of Education Sciences. The very name “What Works Clearinghouse” suggests that studies can definitely answer the question “What Works?” But even well-conducted randomized control trials that have statistically significant results can only tell us what works on average across a variety of typical settings. Most educators don’t need to answer “What Works?” They need to answer “What works, for whom, when, and under what conditions?” As the discussion on Cognitive Tutor:Algebra shows, the results from even our best studies often don’t lead to definitive answers for decision makers, they lead to better information to support more informed decisions.
So, should this send us into a tizzy of skeptical malaise? No. I’m not saying that we’ll never learn anything from research, or that everything is subjective, or that everyone should willy-nilly make decisions based on their gut or what worked for their kid. (Nor, by the way, did I claim in my original post that Cognitive Tutor: Algebra had no value or didn’t work; that claim is as wrong as the claim that it always doubles learning.) Good educational decisions should be built on a foundation of research. Each study is like a brick, which can be combined with other studies for a sturdy foundation. On top of that foundation, educators will make specific decisions for their specific community of students, aligned with the strengths, weaknesses, and cultures of their systems.
In fact, part of the problem is that we don’t have enough bricks. Many fields seeking to make improvements devote 5 to 15% of their resources to research and development; in education that figure is much lower; in 2003 it was closer to 0.01%
So, do more research. Do lots of high quality experimental research when asking questions about comparative efficacy. Have good conversations about how to interpret findings. Train educators to read educational research and join those conversations. Ensure that important studies are not only published in technical language, but that summaries are published in accessible language for people without sophisticated methodological training. Support educators in making thoughtful, informed judgments about their specific circumstances based on the foundations of research and collaborative inquiry.
That’s the plan. Get to work everybody.
For regular updates, follow me on Twitter at @bjfr and for my publications, C.V., and online portfolio, visit EdTechResearcher.