The Limits of Peer Review
Peer review has its uses, and I applaud them. But it also has severe limitations and warrants no florid curtsies.
Much hope and hoopla presently surround the idea of "peer review" in education research and the wonderful things that are said to be in store for school effectiveness and student performance if only Americans would rigorously confine themselves to education programs and practices that have been vetted and judged effective in peer-reviewed studies.
This assumption pervades the recent "No Child Left Behind" Act of 2001, with its oft-repeated emphasis on "scientifically valid" methods, "scientifically proven" programs, and "research based" interventions. It's built into the House bill to convert the U.S. Department of Education's existing office of educational research and improvement, or OERI, into a new Academy of Education Sciences. Indeed, the legislation would require that "all published research, statistics, and evaluation reports conducted by or supported through the Academy, shall be subjected to rigorous peer review before being published." Assistant Secretary of Education Grover J. "Russ" Whitehurst, the academy's presumptive first director, is known for insisting that peer-reviewed research (and randomized experiments) is the only kind worth taking seriously.
Much the same view animates such new private groups as the Education Quality Institute and the Education and the Public Interest Center, whose director, University of Colorado professor of education Kenneth R. Howe, recently took me to task in these pages for insufficient reverence toward peer review. ("Free Market Free-for-All," Commentary, April 10, 2002.)
Reverent I'm not. Peer review has its uses, and I applaud them. But it also has severe limitations and warrants no florid curtsies.
Reduced to its essence, peer review means asking a researcher's scholarly peers to examine his work and determine whether, in their view, it has merit or displays worrisome glitches and needs further work. A tougher, rarer version asks those peers whether, upon examining the same data, they reach the same conclusions as the first analyst. Yet another form, used by some academic journals, federal agencies, and private funders, asks outside reviewers to advise the editor (or bureaucrat or philanthropist) whether an article deserves to be published or a study or project to be funded.
Such questions are often worth asking, and the opinions of colleagues and outside experts are frequently worth heeding. At its best, submitting one's work to others who are knowledgeable about the subject at hand can yield constructive feedback and thoughtful advice. I'm a frequent user of such feedback loops in my own writing and in the work of the Thomas B. Fordham Foundation. I'm also a "supplier." Every couple of weeks, I get asked to comment on manuscripts being considered for publication, projects under review for possible funding, and so on.
I'm all for it. But reverence is something different. In my experience, peer review is better seen as a helpful source of improvement, a second opinion that may or may not accord with the first one, an augmentation of brainpower, prose prowess, and aggregate experience, than as the supreme arbiter of the truth. Editors and funders may defer to the peers' judgments, rather than their own, but that doesn't make such judgments wiser, truer, or more perceptive. Indeed, those who yield such decisions to outsiders may simply be shirking responsibility for them. In the process, they may compromise their own publication's or organization's mission or blur its focus. And they may foster a false sense of confidence or trust on the part of readers or consumers.
The innards of a peer-review process are not unlike the legislative process—messy, complex, full of compromises, muddled rules, and procedural dilemmas. Who picks the peers? From what pool? By what criteria? How is their assignment framed? Do they know whose work they're reviewing? Do they set the criteria for their reviews? Are they judging an isolated study (or project proposal), or do they have alternatives to compare? What happens when reviewers disagree? Does the majority rule? Does each reviewer have a veto? Do those with longer résumés get more votes? Who parses their multifarious and perhaps contradictory comments to determine their "bottom line" recommendation? And on and on.
Education isn't the only field where such complications—and possible corruptions—arise. It also happens regularly in disciplines that educators envy. My wife is a medical researcher, fairly prominent in national and international cardiology circles. She often publishes in peer-reviewed journals, presents her research at peer- reviewed conferences, chairs scientific sessions at such conferences, serves on editorial boards and National Institutes of Health funding panels, and gets enlisted as a peer reviewer herself for grants, reports, and articles under consideration by "refereed" journals.
This may sound like research paradise, but her tales of how it actually works are enough to cool the ardor of the lustiest booster of peer review. Individual reviewers often have conflicts of interest—owning stock in, or consulting for, the companies whose studies or products they are examining. They are driven by friendship, loyalty, or envy to endorse certain scholars and institutions and pan others. They have their own interests to advance, knowing well that those they review today will be reviewing them tomorrow.
Nor are these processes anonymous. Even when authors' names are withheld, reviewers usually make out who wrote what. The number of experts on a particular topic is typically small, and they usually know what one another thinks, what kinds of work others are engaged in, even how others write. All have their own professional reputations to uphold, meaning that if somebody else reports positive results from a procedure, device, or drug that one has previously criticized, there's a strong temptation to let one's comments be colored by one's own involvement with the topic. Moreover, they simply disagree about what's important, about what investigative approaches make sense, about cost-benefit calculations, about what constitutes adequate research protocols— many cardiology researchers are content with six-month results from an intervention, for example, while my wife favors at least a year's worth of data— and about the urgency of investigating one issue rather than another.
Similar differences characterize the work under review. Those doing the actual research make all sorts of judgment calls as they proceed, decisions that are influenced by their own beliefs and priorities (and energies and resources) and that may not be shared by others—including their eventual reviewers.
And that's medicine, upon which education so often gazes with longing. Do not doubt, for example, that the architects of the proposed Academy of Education Sciences have their eye on the National Institutes of Health (which, incidentally, have done some first-rate education research, particularly on early reading). Perhaps they haven't spent much time on those NIH funding panels where the log-rolling occurs, where people look out for their protégés, channel money to their friends, and make trouble for their enemies and rivals, where scarce dollars are apportioned for all sorts of reasons, among which scholarly merit is only one. Bear in mind, too, that one reviewer may think a given project or report has scientific merit, yet, for other reasons, assign it low priority. If one is obsessed with molecular studies of atherosclerosis, for example, one may give short shrift to clinical cardiology research. And vice versa. If one believes a breakthrough is near in bladder cancer—or one's cousin has bladder cancer—one may give lower ratings to an otherwise excellent study of prostate cancer.
Does peer review behave similarly in physics, geology, and psychology? In sociology and economics? I'm not certain. But I'd be mighty surprised to learn that it's immune to such frailties.
Education research is frailer still, because the field lacks many of the experimental traditions and scientific norms that other disciplines espouse. It has few generally agreed-upon standards for what constitutes "good" research— and is riven by differences far more fundamental than the choice between six- and 12-month intervention data. What educators solemnly term "qualitative" research, for example, would be laughed out of the editor's office at the New England Journal of Medicine. Nobody would even bother having it peer- reviewed, for they would instantly recognize that all peer judgments would hinge mainly on the reviewers' values, preferences, and priorities, not scientific merit or analytic accuracy. Congress can legislate all it likes about what "scientifically valid" means, but that doesn't govern the belief structures of the American Educational Research Association or the editorial values of the Teachers College Record.
Let me be clear. As with data- driven studies, "qualitative" work can often be improved by the scrutiny of others. Second and third opinions are frequently beneficial. But let's not pretend that there's something neutral, objective, or scientific about them. The offerers of second and third opinions can make the same mistakes or have their views colored by the same biases. The choice of reviewers often preordains their bottom-line judgments. Those selecting them are well aware of this. Which doesn't render the review process useless. It simply means that key decisions should stay with the cognizant editor, funder, or consumer.
Arguably the hardest thing I did when I held Russ Whitehurst's job in the 1980s was try to make the OERI's peer-review process work in ways that yielded useful input for decisions without being captured by each field's old boys'/girls' networks, people who could be counted upon to ensure that nothing significantly different would ever be tried. I also had to second-guess some of the career staff's own preferences and friendships, which usually overlapped the old-boy networks. No doubt some will say I abused the peer process; I would say that I strove to protect the OERI's work from the most common forms of abuse.
What about more scientific forms of education research, such as true experiments, careful quasi-experimental studies, and meta-analyses? One certainly hopes that today's push for "scientific validity" will lead to more of these, and there's no doubt that peer review can give a boost to such movement.
But let's not wear rosy lenses here, either. Those who despise vouchers—and some who like them—find fault with recent experimental studies of privately funded vouchers in several cities. They cite methodological doubts and analytic uncertainties, but it's just as likely that their motives are political and their differences philosophical. (The thing about peer review is that the one can so easily masquerade as the other.) Similarly, those who dislike phonics in primary reading have noisily faulted the National Reading Panel for its criteria for deciding which among a million studies have scientific merit. That panel engaged in what amounted to an elaborate peer- review process—that was its mandate—but this did not immunize its findings from charges of bias and favoritism, charges hurled not by those with other views of science, but those with different opinions about reading!
Kenneth Howe criticized me for observing that the peer-review emperor wears tattered raiments. He balked at my suggestion, in an earlier Education Week article, that "by selecting the peers, you're preordaining the outcome of the review," and was apoplectic over my acknowledgment in the same article that the Fordham Foundation sees its research mission as engaging in rather than refereeing arguments about education policy. ("Researching the Researchers," Research, Feb. 20, 2002.) He strove to explain his own murky claim that, whilst education research "can never be free of the commitment to some value-laden framework, it can, and often does, address questions that are neutral among value-laden frameworks."
Got that? After struggling mightily with the ancient epistemological question of whether education research can be truly "objective," he concluded that surely it must be. Because it ought to be. Because society would be better off if it were.
Well, maybe. But it doesn't work that way in medicine, and I don't believe it works that way in economics or astronomy. Even when people agree on data, they disagree on the interpretation. They disagree about what data are important. They disagree about what reforms are needed, hence what data deserve attention. They disagree about statistical significance. About the rigor of other people's analyses. About the adequacy of one's controls for other possible explanations of a difference. And, above all, about what education is, how it ought to be delivered, and on what basis.
In such an environment, is research still worth doing? Absolutely. Can it be done better? To be sure. Are second opinions and external reviews valuable? Mostly, yes. But does peer review deserve to be deified as the one true god of education research? I think not. We should never cease in our quest for surer evidence about what produces what results under what circumstances. "Scientifically proven" methods are usually better than guesswork, both in the classroom and in the legislative chamber. And in some parts of education, the science is a lot more robust than in others. But let's not kid ourselves. Science informs policy decisions, it doesn't make them. In a field as value-laden as education, we do nobody a favor to pretend otherwise. And in a democratic society that is wary of deferring overmuch to experts, we probably would not like the consequences of an education system shaped by the opinions of those who style themselves "scientists"—a population that is bound to grow by dubious leaps and ersatz bounds as greater deference is paid to their views.
I keep recalling William F. Buckley's quip about preferring to be governed by the first hundred names in the Boston phone book than by the Harvard faculty. He could have said much the same thing about education.
Vol. 21, Issue 34, Pages 30, 34Published in Print: May 8, 2002, as The Limits of Peer Review