AERA Stresses Value of Alternatives to 'Gold Standard'
Experiment not only route to solid findings, panel says.
At a time when federal education officials are holding up scientific experiments as a gold standard for studies in the field, a report by the nation’s largest education research group suggests there are also other methods that are nearly as good for answering questions about what works in schools.
The report, produced by a committee of scholars of the Washington-based American Educational Research Association, was released April 11 at the group’s 88th annual meeting here. It highlights ways in which researchers can use large-scale data sets, such as those maintained by the U.S. Department of Education, for analyzing cause-and-effect questions in education.
“It’s not a question of either-or,” said Barbara Schneider, the chair of the committee and a professor of educational administration and sociology at Michigan State University in East Lansing. “It’s about the importance of building capacity in our field. We need researchers who can use a variety of methods to answer appropriate research questions.”
Under the Bush administration, the Education Department has been promoting randomized-control trials in a campaign to transform education into an evidence-based field, not unlike medicine, and improve the quality of education research.
The AERA scholars, in their report, don’t argue with the value of rigorous experimentation for making causal inferences. Yet, they note, there are also times when such studies, which can involve randomly assigning students or classrooms to either an experimental or a control group, are not feasible or ethical. To test what happens when students repeat a grade, for instance, researchers can’t ask schools to randomly hold back some students while promoting others.
“Randomized-control trials are the gold standard,” said Richard J. Shavelson, a study author and a professor of education and psychology at Stanford University. “But they have limitations, and there are a lot of excellent data sets available that can also be used and that don’t necessarily fit with the randomized-control-trial model.”
The problem with some studies that draw on large-scale observational studies, such as the Education Department’s High School and Beyond Study or the National Education Longitudinal Study, is that researchers fail to statistically account for differences between subjects in groups under study.
One example: Students who attend private schools might come from wealthier homes, and start out with greater educational advantages, than their public school counterparts. Such differences make simple comparisons between the two groups suspect.
In recent decades, though, with advances in high-speed computing and the importing of research techniques pioneered in fields such as economics, reliable methods for reducing potential biases between study groups have become more accessible to education researchers. In their report, the AERA researchers highlighted four such methods:
- Fixed-effects models, which involve adjusting for unmeasured characteristics that don’t change over time, such as the impact of a mother’s personality on children in the same family;
- Testing for instrumental variables, which are characteristics that should be linked with the treatment but not with the outcome;
- Propensity scoring, a method that calls for building statistical profiles that predict the probability that individuals with certain characteristics will be part of a treatment group and testing results against alternative hypotheses; and
- Regression-discontinuity analyses, a technique in which researchers compare subjects that fall just below or just above some cutoff point, such as a proficient level on a standardized test.
While all four methods have their own drawbacks, the researchers say, they also represent an improvement over most of the techniques, such as simple correlational studies, that researchers relied on the make sense of the data in those large-scale studies. Researchers need to know which methods are appropriate for answering which kinds of questions, according to the report.
“There are lots of people who, with limited information, try to make causal inferences leading to major policy directions, said William H. Schmidt, a study author and an education professor at Michigan State. “This is an attempt to say there are principles for this.”
Speaking to the Field
Titled “Estimating Causal Effects: Using Experimental and Observational Designs,” the 142-page consensus report was produced by the research group’s grants board, an expert panel created 17 years ago with the aim of building the field’s capacity for conducting quantitative analyses.
Panel members said they undertook the study project in 2003 at the behest of the National Science Foundation, which along with the Education Department’s National Center for Education Statistics underwrites the AERA grants board’s work. Leading methodologists outside the board also reviewed and critiqued drafts of the study, the authors said.
Panelists said the report, which the research group hopes to make available for free on it Web site,can provide guidance to policymakers and the news media as well as to colleges of education and other researchers.
“One of the things this monograph does is it really speaks to our field,” said Anthony S. Bryk, a Stanford education professor who commented on the panel’s recommendations at the April 9-13 meeting. He noted that many policy analyses in education are now done by scholars outside the field, such as economists or think tank researchers.
“We need people who can do this kind of work at the same level of expertise and skill as people in schools of public policy,” Mr. Bryk said, “but who want to work in colleagueship with people who have deep understanding about schools and how they work.”
“Otherwise,” he added, “we’ll end up with elegant studies that reach wrong conclusions.”
Vol. 26, Issue 33, Pages 12-13Published in Print: April 18, 2007, as AERA Stresses Value of Alternatives to 'Gold Standard'