The Diogenes Factor
|Why it's hard to get an unbiased view of programs like 'Success for All.'|
The federal government and foundations sponsor report after report describing programs said to raise student achievement, particularly that of poor, urban, and minority children. Many articles in scholarly and practitioner journals also describe programs that apparently raise students' test scores.
Yet, the best long-term indicator of achievement, the National Assessment of Educational Progress, shows no consistent upward trends during the past three decades. The latest international achievement comparisons, moreover, show U.S. students ahead in the early school years but falling to the back of the pack by the senior year of high school. The longer they are in school, the further they fall behind the averages of other countries. What explains this paradox of successful programs and failing students?
Despite many reports of success, we find few objective evaluations conducted by independent investigators. Staffs of government agencies and program developers apparently believe their programs work, and usually commission or carry out their own evaluations to prove their point. Consciously or not, their beliefs can strongly affect the design, conduct, and results of evaluations.
Bias can even affect "pure" research results even when politics, jobs, and money are not at issue. To avoid such bias in medical research, for example, investigators use double-blind experiments. Neither patients nor caregivers know which patients get the experimental medicine and which receive placebo pills known to have no physiological effect. This procedure enables the investigators to separate the effect of the drug from the effect of patients' suggestibility or belief in treatment efficacy.
In educational evaluation, placebo effects are usually built in rather than controlled since program developers, administrators, and teachers all know that they are employing a new program and that they are being watched. This, of course, may make programs appear more successful than they would be in normal practice.
Federal support of education programs, moreover, raises powerful pressures. The federal government, for example, has spent more than $100 billion on the Chapter 1/Title I program to raise the achievement of poor children. With such huge amounts of money at stake, program developers, administrators, and evaluators have strong financial interests in showing success. Their jobs, salaries, and perquisites depend on continued funding. What's more, program developers, who have been supported by government and foundations, increasingly are selling their materials and services to schools.
When for-profit firms offer programs to schools, educators remain on guard: Let the buyer beware. Government agencies, foundations, and other not-for-profits are often thought to be superior in knowledge, objectivity, and altruism. They, however, are increasingly driven by monetary and political pressures, which are not necessarily in the public or students' interest. The same government agencies and foundations that fund the programs, for example, hire evaluators, evaluate the programs themselves, or allow program developers to evaluate the programs. Having said the programs would succeed, can agency administrators easily return to Congress or their foundation's governing board to say they were wrong? Are they likely to hire independent-minded evaluators?
The principle of "conflict of interest" is hardly news. Aristotle warned his fellow citizens to consider the source, and the ancient Romans asked who would benefit from proposed conclusions and decisions. What is new is the pervasiveness of what we will call "the Diogenes factor" in program evaluation. According to ancient Athenian lore, Diogenes searched, with a lighted lantern, through daytime Athens for honesty. Though fabrication may be rare in educational evaluation, we can easily find selective evidence and misleading comparisons, which favor funded programs. These lead to misleading overestimates of program effectiveness.
|When for-profit firms offer programs to schools, educators remain on guard: Let the buyer beware.|
Consider "Success for All," which provides a noteworthy example of the Diogenes factor in a federally supported program. Though its own developers declare it a huge success, independent evaluators find essentially negative evidence.
In the January 1998 Phi Delta Kappan, a chief Success for All developer, Robert Slavin, and his Johns Hopkins University colleague Olatokunbo Fashola, ask which widely disseminated reform programs meet their criteria for achievement. Since Success for All revenues are expected to grow from $15 million to $30 million in the coming year, Mr. Slavin and Ms. Fashola may not be disinterested parties. They nonetheless review Success for All, its extension Roots and Wings, and 11 other programs, including E.D. Hirsch Jr.'s Core Knowledge, Henry Levin's Accelerated Schools, and James P. Comer's School Development Program.
Their evaluation reveals only two programs that meet their achievement criterion--Success for All and Roots and Wings. In accord with this assessment, they devote three full columns of print to their own two programs, and between about a fifth and a full column for the other 11.
Ms. Fashola and Mr. Slavin estimated the numerical effects of Success for All on achievement, the average of which is one of the largest program effects ever reported. It would place Success for All students at the 85th percentile of control-group students. All of their estimates, they say, were above the level they consider educationally significant.
On the other hand, an independent evaluation of Success for All by Elizabeth Jones and Gary and Denise Gottfredson of the University of Maryland showed an average effect of near zero--that is, Success for All students scored about the 50th percentile or the same as matched control groups. In five of 10 comparisons, the Maryland group found that control groups outscored Success for All students. The Maryland group also compiled six estimated effects from other independent evaluations of Success for All. In two cases, Success for All students did better than control groups; in two cases, the differences were not educationally significant; and in two instances, control groups outscored Success for All students.
Success for All's expressed goal is to bring all children to or near grade level by 3rd grade so they may progress normally in the later grades. In another independent evaluation, Richard Venezky of the University of Delaware pointed out: "According to the project's own reports, Success for All has clearly not led to all students' achieving at or near grade level by the end of grade 3, even with only reading and language arts [which Success for All emphasizes] included in the outcomes assessment." Mr. Slavin and Ms. Fashola make no mention of their own negative findings in their Kappan report.
|Federal funds continue to support the promulgation and biased evaluation of failed programs.|
Mr. Venezky carried out a Success for All evaluation in Baltimore, where the program originated and should do well. He, nonetheless, concluded that the average Success for All student failed to reach grade-level performance by the end of grade 3. Even with further Success for All instruction, students continued to fall further behind national norms. By the end of 5th grade, they were almost 2.4 years behind.
Thus, the Success for All developers and independent reviewers differ hugely in their estimates of its effectiveness. Any one or all of the following reasons probably account for these differences:
1. Success for All insists that 80 percent of the teachers vote to adopt the program in a secret ballot, but such schools are unusual in consensus and determination. Even if matched in socioeconomic status to control schools, they are hardly run-of-the-mill schools, where such consensus is rarely reached.
2. Success for All concentrates on reading, possibly sacrificing math, science, and other subjects. Reading results are misleading estimates of the program's overall effects on the broad range of primary school subjects and skills.
3. Unlike standardized national achievement tests used by independent evaluators, Success for All employs individually administered tests that favor the program and which are subject to biased impressions and scoring by Success for All's own evaluators.
4. In its Kappan comparisons with other programs, Success for All cites its own positive effects, not its own negative findings, nor the negative findings of independent evaluations.
The Success for All evaluation story is hardly unique. The poor progress of American students during the school years relative to those in other countries is a national tragedy, sadder still among poor and minority students who tend to fall even further behind. Five independent evaluations of the $7 billion-per-year Chapter 1 program for disadvantaged children showed little achievement difference between program and control groups.
Yet federal funds continue to support the promulgation and biased evaluation of failed programs. This is worse than doing nothing. It wastes vast resources, obscures the problem, and delays productive solutions. Diogenes, lend us your lantern.
Herbert J. Walberg is a research professor of education and psychology at the University of Illinois at Chicago. Rebecca C. Greenberg is a doctoral student at the university.