Achievement Gains in Voucher Research
Earlier this month in these pages, William G. Howell, Paul E. Peterson, Patrick J. Wolf, and David E. Campbell defended their voucher research by responding to critics. ("In Defense of Our Voucher Research," Commentary, Feb. 7, 2001.) As one who has found fault with their two-year, three-city study comparing pupils who received vouchers with those who did not, I am still not convinced. Here are just two of the reasons why.
Messrs. Howell, Peterson, Wolf, and Campbell claim that we should look at overall gains by ethnic group rather than grade-by-grade results.
|When half the original sample in an experiment does not show up for the post-test, we really have little idea whether the estimated gains scores are biased or not.|
This would be a reasonable demand if the pattern of gains were consistent across grades, and if gains were statistically significant in at least several grades in each city. In New York City, as reported by Mathematica Policy Research, the contractor on the study, only the one cohort of African- Americans who used vouchers to attend private school 5th grades in the study's first year and 6th grades in the study's second year showed significant test- score gains when compared with the same cohort who did not receive vouchers. Their entire gain was made in the study's first year.
In Dayton, Ohio, according to Mr. Howell, who was kind enough to provide me unpublished data by grade, African-American voucher recipients finishing private school 3rd grades at the end of the study's second year made very large two-year gains (in combined math and reading). Those finishing 4th grade performed slightly worse than pupils who did not get vouchers. Those finishing private school 5th grades made large gains, those finishing private school 6th grades, small losses, those finishing 7th grade large gains, and those finishing 8th and 9th grades, large losses compared to nonvoucher students. Only in Washington were achievement gains of voucher recipients attending private schools relatively consistent across grades.
By putting all African-American students together, the researchers get a positive, significant private school effect in each city; but in two of the cities (New York and Dayton), combining results obscures possible cohort effects that may have little to do with differences between public and private education.
A second and even more serious issue that clouds the research is nonresponse. In New York City, about 40 percent, and in Washington and Dayton, about one-half of voucher recipients and nonrecipients failed to show up for the follow-up first- and second-year tests. The researchers made an attempt to weight the scores of those who did show up to make them representative of the overall sample. They argue that they are able to "fill in" missing second- and third-round test scores implicity, based on the personal data gathered on voucher recipients and nonrecipients in the first, baseline round of testing.
But students who did not show up for the second and third round of testing may have done so based on how they performed during the school year, not just based on their personal characteristics and baseline test scores, and may have done so differently in the voucher-recipient and -nonrecipient samples. Clearly, many "unobservable" variables are important in predicting how a student performs on a particular test on a particular day, including how he or she expects to perform. When half the original sample in an experiment does not show up for the post- test, and we can only predict part of the variation of the outcome variable to fill in for the missing observations, we really have little idea whether the estimated gains scores are biased or not.
The nonresponse rate increases between the first and second follow-up tests, especially in Washington. There, almost 25 percent fewer students participated in the second follow-up than in the first. Gains are particularly volatile in Washington among middle school students. They suffered large losses in scores at the end of the first year and made enormous gains in the second. Was this due to a fraction of very low- scoring voucher students leaving the sample between the first and second years? Did some leave private school, or just not show up for the test? The researchers could easily answer this question with further analysis.
Such debates among researchers studying the same data are not only healthy; they are absolutely crucial to developing good public policy.
Indeed, they could go one better. They could release their data so that other researchers could subject the data to the scrutiny they deserve. This is exactly what one of the researchers, Paul Peterson, demanded of University of Wisconsin-Milwaukee professor John Witte in 1996, when it came to data from the Milwaukee voucher experiment. Once Mr. Peterson got the data, his analysis was different from Mr. Witte's. Subsequently, Princeton University researcher Cecilia Rouse reanalyzed the data yet again and got another set of results. Such debates among researchers studying the same data are not only healthy; they are absolutely crucial to developing good public policy.
It is distinctly possible that a few hundred low-income African-American students, randomly drawn from families highly dissatisfied with Washington's public schools and placed in one of the District of Columbia's existing private schools with a group of higher-performing students, will make gains when compared with students who remain in public schools. It is also possible that these gains are partly due to characteristics of the private schools, not just the new set of peers surrounding the sample of voucher students. Right now, though, even this case—the most plausible based on the available results from the three cities—is still sufficiently riddled with potential estimation bias that it may or may not reflect the "true" effect of using a voucher in a private school.
Messrs. Howell, Peterson, Wolf, and Campbell cite Carolyn Hoxby's description of randomized field trials as the "gold standard" of social science research, and want to bask in its glow. I agree with Ms. Hoxby's characterization. But to the extent that the researchers' results depend upon instrumental variables and upon weighting to correct for nonparticipation, they are no longer reporting results of a randomized field trial, but rather are reporting an empirical study with many of the dangers of assuming the relevance of characteristics that the "gold standard" attempts to avoid.
The Tennessee class-size study, to which they liken their own, suffers from similar problems, but to a lesser extent. In the Tennessee STAR study, all initial participants had identifiers, and could be tracked subsequent to their participation, even if they dropped out of the study, provided they remained within the state. As virtually all students who left the experimental schools and classrooms did remain in Tennessee, the instrumental variable of initial assignment could be utilized without the necessity of a more speculative weighting based on presumed similarities in family characteristics. Because Messrs. Howell, Peterson, Wolf, and Campbell had no similar way of tracking those who dropped out of their studies, their abandonment of the "gold standard" is necessarily more severe.
Martin Carnoy is a professor of education and economics at Stanford University. His original critique of the Howell, Peterson, Wolf, and Campbell research appeared in The American Prospect, Jan. 15, 2001. A more detailed version is available from the Economic Policy Institute in Washington.
Vol. 20, Issue 24, Page 31