|Since the release of our report, various criticisms have been raised. These questions about our study deserve a response.|
In early September, we reported the effect of school vouchers on student test scores in New York City; Dayton, Ohio; and the District of Columbia. The findings from these studies were remarkably consistent. Nowhere did we observe positive effects of vouchers on the test scores of students other than African-Americans. But after two years, African-Americans in all three cities who switched from a public to a private school scored about 6 percentile points higher than comparable students who remained in public school.
Results varied only slightly by location. In New York, African-Americans scored 4 percentile points higher than the control group. In Dayton and Washington, they scored 6 and 9 percentile points higher, respectively. In all three cities, impacts on the overall test performance of African-Americans were statistically significant.
All three evaluations were designed as randomized field trials, what Harvard University economist Caroline Hoxby calls the “gold standard” of social science research. But since the release of our report, various criticisms have been raised. These questions about our study deserve a response.
Alex Molnar, a professor of education at the University of Wisconsin-Milwaukee, and Charles Achilles, a principal investigator of the Tennessee Student Teacher Achievement Ratio, or STAR, experiment, argued in an Education Week Commentary that our report of average effects “tend[s] to conceal inconsistent findings ... [making] the achievement impact reported appear more generalized than it is.” (“Voucher and Class-Size Research,” Commentary, Oct. 25, 2000.) They then provided a variety of examples of varying results from one grade to the next, one subject to the next, one year to the next, and one city to the next. On the basis of these fluctuations, they concluded that the findings were problematic. Writing in The American Prospect, Stanford University education professor Martin Carnoy has made much the same suggestion.
But do average effects “conceal inconsistent findings”? On the contrary, averages simply take into account all the specific observations that have been made. When one breaks data sets into small fragments (by grade level, type of test, city, and so on), perturbations often appear. Yet much is to be learned from looking at measures of central tendencies. Assume we were interested in learning about differences in the December climates of Madison, Wis., and Palo Alto, Calif. On some winter afternoons, Madison may enjoy temperatures above 40 degrees, while thermometers in Palo Alto on occasion drop below 30 degrees. It would be absurd, though, to conclude from such instances that one could survive a Wisconsin winter in California clothing. Measures of central tendencies are equally appropriate in voucher research.
Noted education statistician Anthony Bryk and his colleagues make much the same point when they recommend that conclusions about impacts of a school intervention not be drawn from “single grade information.” “Judging a school by looking at only selected grades can be misleading,” they write. “We would be better off, from a statistical perspective, to average across adjacent grades to develop a more stable estimate of school productivity.”
When Mr. Molnar and Mr. Achilles turn to the study of class size in Tennessee, they adopt the approach we recommend. They look for overall trends, not subgroup fluctuations. Instead of focusing on effects by school district or by grade level or by type of test taken, they report the average effects of class-size reduction on students in grades K-3. We only hope that, in the future, Mr. Molnar and Mr. Achilles will interpret our findings in the same way they present the class-size results from Tennessee.
When we began these evaluations, we had no idea that African-Americans would be the only group to demonstrate test-score gains. But after two years, the findings were remarkably consistent.
When they do so, they will discover that the size of the effects of vouchers on African-American students after two years is approximately the same as the effects of the class-size experiment on black students in Tennessee. Both the class-size and voucher evaluations estimate impacts of roughly one-third of a standard deviation, a moderately large effect.
Does this mean that one should never look beyond an overall average? No, not when one finds systematic differences among subgroups. For example, we found no systematic effects of vouchers on the test scores of non-African-Americans, even while regularly finding positive effects on African-American test scores. So far, however, we have not found other differences that appear with the same regularity. On the contrary, the other fluctuations in the data set seem as random as an unusually warm day in Madison.
Other commentators have raised questions about those who were offered vouchers but chose not to use them. Mr. Carnoy expressed his concern to The New York Times “that there are all sorts of nonmeasurable characteristics of these kids that made it difficult for them to get into these private schools. Even if they got vouchers, they might not have been able to pay the other costs associated with private schools, and even if they could pay, they might not have been able to get into a private school.”
It is true that only about half the students took the voucher that was offered to them (the takers) and about half did not (the decliners). As we discuss in our reports, takers and decliners differed in a number of respects. Most notably, takers had higher family incomes in New York City and Washington, but lower incomes in Dayton. The New York and Washington findings are not surprising, given that the voucher awards did not cover all the costs of a private education. These additional costs were the reason most frequently given by families for not using the voucher. Presumably, the take-up rates would rise if the monetary value of the vouchers were increased.
Nevertheless, the fact that takers and decliners differ in income and other respects does not bias our estimates of the impacts of vouchers on test scores. The widely used “instrumental variable” technique that we employ effectively adjusts for these differences. This analytical technique takes advantage of the fact that vouchers were offered at random. It was first used in medical research, is now commonplace in econometric studies, and was employed by Alan Krueger in his study of the effects of class size on student performance in Tennessee, the very study that many of our critics hold in such high regard.
Critics have also pointed out that not everyone in the test and control groups continued to participate in the evaluations two years later. The problem is encountered by virtually all evaluations of social interventions. New York Times education writer Richard Rothstein, however, believes that the problem is sufficiently acute in our study to “make the results meaningless.” The facts suggest otherwise. First, the response rates of the treatment and control groups are comparable. Second, the baseline characteristics of students who attended follow-up testing sessions differed very little from those who skipped them. And finally, to correct for the minor differences that were observed, we weighted the data based upon the probability that each child, given her family background and initial test scores, would attend follow-up sessions.
Although our critics concede the benefits of randomized field trials, they lament that our evaluations are not “blind,” as in medical experiments that randomly give patients pills or placebos. Yet scientific evaluations of many medical interventions—heart bypasses, mastectomies, and others—are no less transparent than the voucher interventions we evaluated. Nor was the Tennessee class-size study, which Mr. Carnoy, Mr. Rothstein, and Messrs. Molnar and Achilles find so persuasive.
|We trust that those who would improve on our research will join us in supporting future voucher interventions to help some of America’s students who need it most: low-income African-Americans living in central cities.|
Mr. Rothstein argues that transparency poses a particular problem for voucher research, because those who do not win the lottery are so “sorely disappointed” that they help their children less, causing control-group test scores to drop. Our findings, says Mr. Carnoy, may well represent an “adverse ‘disappointment’ effect,” rather than any positive gain associated with attending a private school. Our data, however, simply do not support these conjectures. We found no difference in the help given children by parents in the test and control groups. And surveys show no evidence that the students in the control group became unhappy with their schools in the year following the voucher lottery. Moreover, it remains unclear how the Rothstein or Carnoy hypothesis explains the fact that African-Americans who switched from public to private schools showed gains, but other ethnic groups did not. Were African-Americans who did not receive a voucher the only applicants who were disappointed?
As a corollary to his negative-treatment hypothesis, Mr. Carnoy suggests that our findings probably represent what researchers call Hawthorne effects. When looking at parental-satisfaction rates in some cities (especially New York), one does discover an initial burst of enthusiasm followed by more temperate assessments, lending credence to Mr. Carnoy’s intuition. (Interestingly, the Tennessee STAR study only shows test- score gains in the first year, leading some scholars to argue that Hawthorne effects hold there as well.) When considering the three sites together, however, this trend does not hold for test scores in the voucher interventions we studied. Overall, African-Americans who switched from a public to a private school gained 3.3 points after one year, and 6.3 points after two years, a pattern that is difficult to explain away as a mere Hawthorne effect.
When we began these evaluations, we had no idea that African-Americans would be the only group to demonstrate test-score gains. But after two years, in three separate cities, the evaluations yielded a remarkably consistent set of findings. The third-year data may force us to re-evaluate. But it is the data, and not any prior ideological commitment, that will inform our assessment of who benefits from school vouchers, and who does not.
To what should we attribute the gains that we have observed thus far? Mr. Rothstein thinks positive effects arise when voucher recipients “are surrounded by pupils with higher academic expectations.” Mr. Carnoy suggests that gains for African- Americans are due to “a more structured private school environment with smaller classes.”
We don’t know what the answer is. The surest way to find out, though, is to sponsor larger pilot programs that can be studied for longer periods of time.
We trust that those who would improve on our research will join us in supporting future voucher interventions, so that we can begin to discern why this particular educational intervention appears to help some of America’s students who need it most: low-income African-Americans living in central cities.
William G. Howell is an assistant professor of political science at the University of Wisconsin-Madison. Patrick J. Wolf is an assistant professor of public policy at Georgetown University in Washington and a guest scholar in governmental studies at the Brookings Institution. Paul E. Peterson is the Henry Lee Shattuck professor of government at Harvard University in Cambridge, Mass., where David E. Campbell is a Ph.D. candidate in government.
A version of this article appeared in the February 07, 2001 edition of Education Week as In Defense of Our Voucher Research