Debate Over Effectiveness Has Shaped Federal Policy

By James Crawford — April 01, 1987 13 min read

A Dec. 15, 1986, response to the GAO contained the department’s first official reply to Ms. Willig: “The Willig meta-analysis reviewed a nonrepresentative and very small sample of the existing research and used an inappropriate methodology. It is by no means a comprehensive review of the literature.’'

Basic research on second-language acquisition provides far more “insights to teachers on effective classroom practices and [on] how they might help limited-English-proficient students,’' says a panel of experts convened by the group.

Bilingual Education
Bilingual Education Traces Its U.S. Roots to the Colonial Era
Bilingual Policy Has Taken Shape Along Two Federal Tracks
California Vote Gives Boost To ‘English-Only’ Movement
Bilingual Education Traces Its U.S. Roots to the Colonial Era
Officials, Educators Reach No Consensus on Research
Language-Acquisition Theory Revolutionizing Instruction
Bilingualism: Advantage or Disadvantage for Children?
Debate Over Effectiveness Has Shaped Federal Policy
G.A.O. Findings Run Counter to U.S. Education Department Views
The Special Case of Bilingual Education for Indian Students
California Program Grapples With Problems, Scores Successes
Technology Bill Approved
Commentary: The Essential Elements Of Literacy

And yet, there is no avoiding the controversy over effectiveness, which has dominated federal policy considerations. Even the ASCD feels the need to affirm that “bilingual education works.’'

Skeptical views about native-language instruction began appearing on the nation’s editorial pages in the mid-1970’s. In 1978 came the first substantial criticism from experts, with the release of an American Institutes of Research evaluation of federally funded bilingual-education programs.

The study, which encompassed 38 bilingual programs and more than 7,000 LEP children in 150 schools, remains the largest evaluation of Title VII’s impact, although a broader longitudinal study is now in progress.

The AIR study reviewed the students’ achievement-test scores for 1975-76 (and a smaller sample for 1976-77) to compare the effectiveness of bilingual education with that of “submersion,’' or no special help for LEP children. After pooling the results for both groups, researchers could find no significant advantage for bilingual programs; in English reading, the submersion students actually fared better, according to the report.

Press accounts of the AIR findings, which came at a time when many school districts were chafing under pressure from the federal office for civil rights to initiate bilingual programs, fanned the bilingual-education controversy.

Advocates of bilingual education attacked the report’s methodology. They argued that it unfairly judged the concept of native-language instruction by statistically lumping together a diversity of instructional approaches, language and socioeconomic groups served, and degrees of teacher preparation.

Above all, the quality of bilingual instruction varied widely, they said. “In such an analysis,’' according to Tracy C. Gray and M. Beatriz Arias of the Center for Applied Linguistics, “the positive effects found with the good programs are often canceled out by the negative effects found with the bad programs.’'

Also, the AIR study had measured student progress over only 5 months--too short a period, critics argued, to gauge the effectiveness of bilingual approaches. Moreover, control groups tended to have brighter students, including former bilingual-program students, thus biasing the comparison, the critics charged.

Rudolph C. Troike, then director of the National Clearinghouse for Bilingual Education, compiled a monograph describing 12 bilingual programs in which students “exceeded the achievement levels of control groups or district norms, and in several instances they exceeded national norms in English, reading, and math.’'

Mr. Troike documented program successes among French-speaking children in St. John Valley, Me.; Spanish-speakers inSanta Fe., N.M.; Navajos in Rock Point, Ariz.; and Chinese in San Francisco. Clearly, both sides could marshall program-evaluation evidence.

Baker-de Kanter Report

In 1980, the U.S. Education Department’s office of planning, budget, and evaluation, which had directed the AIR study, embarked on an internal review of the research literature at the request of the White House Regulatory Analysis and Review Group.

The Carter Administration was then drawing up its ill-fated Lau regulations, which would have mandated bilingual education for LEP children wherever possible. But in August, without waiting for the department’s conclusions, the White House went ahead with the unpopular proposal, which the Reagan Administration subsequently withdrew in early 1981.

Keith A. Baker and Adriana A. de Kanter, respectively a sociologist and a public-policy researcher in the office of planning, budget, and evaluation, launched the project anyway. Their goal was to determine whether there was sufficient evidence to justify a federal mandate for bilingual education.

After reviewing more than 300 studies, about half of which were primary evaluations of actual programs, the researchers judged most of the research to be methodologically unsound. In the end, all but 28 studies were thrown out because of design weaknesses.

Among the evaluations left out were all 12 cited by Mr. Troike, along with many of bilingual education’s best-publicized success stories. Among those included by Mr. Baker and Ms. de Kanter was the AIR study, despite the controversy over its methodology.

In a majority of the 28 reports, the researchers found, there were no significant differences in student English or mathematics performance between transitional bilingual education and submersion. Among the studies that showed one method was superior, the score was split about evenly between the two. And some research indicated that either immersion or English as a second language was superior to bilingual education or submersion.

“No consistent evidence supports the effectiveness of transitional bilingual education,’' Mr. Baker and Ms. de Kanter concluded. “An occasional, inexplicable success is not reason enough to make transitional bilingual education the law of the land.’' They added that “the time spent using the home language in the classroom may be harmful because it reduces [the time for] English practice.’'

“Federal policy should be more flexible,’' the report recommended. “Although transitional bilingual education has been found to work in some settings, it also has been found ineffective and even harmful in other places. Furthermore, both major alternatives to transitional bilingual education--structured immersion and ESL--have been found to work in some settings.’'

The researchers dismissed the central complaint raised against the AIR study, that the uneven quality of bilingual programs invalidated global comparisons with other approaches. “Without some independent measure of the success of implementation of each project,’' they argued, this hypothesis “is a meaningless tautology.’'

Even if the conclusion were true, the authors said, it might “be more cost-effective to switch to alternative instructional methods [than] to undertake large-scale efforts to redesign and properly implement transitional bilingual education programs.’'

Little Evidence for Immersion

The study pointed to the need for “a widespread, structured-immersion demonstration program,’' based on positive program-evaluation findings. Secretary of Education William J. Bennett has enthusiastically embraced immersion, along with ESL instruction, as a promising alternative to bilingual education.

But “there is extraordinarily little evidence for English-only programs showing much promise,’' says Mr. Troike, now professor of education at the University of Illinois. While “there is an enormous methodological base in ESL, oriented toward whether one method works better or not,’' he adds, there has been no research comparing the effectiveness of ESL-only approaches against bilingual education.

For the past decade, Teachers of English to Speakers of Other Languages, the ESL professional organization, has endorsed the use of bilingual education wherever practical.

As for structured immersion in English, Mr. Baker and Ms. de Kanter, writing in 1981, could cite only a controversial, unpublished study claiming positive effects for this approach. And, because its bilingual teachers provide a daily Spanish-language component, considerable dispute remains about whether the preschool program studied, located in McAllen, Tex., should be termed “structured immersion.’'

The authors’ enthusiasm for the method was based largely on evaluations of French immersion with English-speaking children in Canada. Researchers who designed and studied these programs have warned, however, that their success should not be extrapolated to justify immersion for language-minority children, which they predict would be harmful.

Rejecting these objections as “untested theoretical arguments,’' Mr. Baker and Ms. de Kanter said, “Immersion may not transfer successfully from Canada to the United States, but this is an empirical question that must be answered by direct test.’'

Over the intervening six years, little additional evidence has accumulated to confirm immersion’s effectiveness here. In January, after Secretary Bennett’s most recent proposal to remove funding restrictions for alternatives to bilingual education, the Education Department could cite only one published primary study that supports the promise of English immersion.

The office of planning, budget, and evaluation is directing a four-year, multi-million-dollar study of “immersion strategies’’ by SRA Technologies Inc. But according to early test results, students receiving such instruction are faring poorly compared with children in both “late exit’’ and “early exit’’ bilingual programs--even in English reading. The department has argued against drawing conclusions from these “preliminary results.’'

Critiquing the Critics

The Baker-de Kanter report, although never officially endorsed by the Education Department, was leaked to the press in September 1981. Its widely publicized conclusions were regarded as an indictment of bilingual education--an interpretation the authors have disavowed.

As with the AIR study three years earlier, bilingual-education supporters protested the report’s methodology, criticizing its program labels as oversimplified and misleading because they lumped together a variety of instructional treatments.

The researchers were charged with using a double standard of methodological acceptability to exclude evaluations favorable to native-language instruction and to include those that were unfavorable. And the “vote counting’’ approach--tallying studies on each side of the effectiveness question--was described by critics as a crude research tool at best, or an instrument of political bias at worst.

The most influential rebuttal of the Baker-de Kanter study appeared in 1985. Ann C. Willig, an educational psychologist now at the University of Texas, completed a review of the study’s data using a sophisticated statistical technique known as “meta-analysis.’'

By combining “mean effect sizes,’' or differences between programs, that sometimes were too small to be statistically significant in individual studies, Ms. Willig’s methodology allowed for a more precise measurement of overall differences in student achievement between bilingual and submersion classrooms.

The meta-analysis also adjusted for 183 variables that Mr. Baker and Ms. de Kanter had not taken into account, ranging from student and teacher characteristics to instructional methodologies to the language of achievement tests. But most important, the meta-analysis controlled for methodological flaws in the evaluations under review.

Reanalyzing the Data

The Willig study reanalyzed 23 of the 28 evaluations included in the Baker-de Kanter report, excluding nonprimary studies and the French immersion research as not applicable in the United States. She accepted the Education Department researchers’ decision to disqualify a long list of studies supporting the effectiveness of bilingual education.

Nevertheless, a meta-analysis of the Baker de-Kanter data “consistently produced small-to-moderate differences favoring bilingual education,’' she reported. The pattern prevailed on English tests of reading, language skills, mathematics, and total achievement, as well as on Spanish tests of listening comprehension, reading, writing, total language, mathematics, social studies, and attitudes toward school or self.

These “significant positive effects,’' Ms. Willig said, became visible only when statistical controls were applied for methodological inadequacies.

In other words, the better a study’s research design, the better bilingual education fared. For example, where evaluations used random assignment of subjects to experimental and comparison groups, the effect size was largest in favor of bilingual programs.

On the other hand, some research (such as the AIR study) used comparison groups in which students had “graduated’’ from bilingual programs, in which children were judged to be “English-dominant,’' or in which teachers were bilingual. In some bilingual programs, successful students exited during the experiment and were replaced by incoming LEP children. Sometimes there was high teacher turnover or frequent reorganization in the bilingual classroom, but not in the comparison classroom.

A major problem for researchers is that random assignment of LEP children to “sink or swim’’ classrooms has been illegal since the U.S. Supreme Court’s Lau v. Nichols decision in 1974. “True experiments’’ comparing program alternatives are often impossible to conduct in a normal school. Still, some study designs are more successful than others in controlling unwanted variables.

The generally low quality of research in the field has been a favorite argument of those who maintain that the effectiveness of native-language instruction remains unproved.

Ms. Willig agreed that, “until quality research in bilingual education becomes a norm rather than a scarcity,’' educators will be unable to fully address the needs of language-minority children. She emphasized, however, that the poorest studies are those that deny, rather than confirm, the benefits of bilingual education.

“In every instance where there did not appear to be crucial inequalities between experimental and comparison groups,’' Ms. Willig concluded, “children in the bilingual programs averaged higher than the comparison children.’'

Department Responds

Ms. Willig’s 1985 study was cited recently by several experts polled by the U.S. General Accounting Office as evidence against the Education Department’s position that research is inconclusive on the value of native-language instruction.

A Dec. 15, 1986, response to the GAO contained the department’s first official reply to Ms. Willig: “The Willig meta-analysis reviewed a nonrepresentative and very small sample of the existing research and used an inappropriate methodology. It is by no means a comprehensive review of the literature.’'

Asked to explain why Ms. Willig’s study was any less comprehensive than the Baker-de Kanter report, Mr. Baker said his review was later expanded to include 39 studies. This version, published in 1983, reached conclusions identical to those in the 1981 draft.

Meta-analysis is an inappropriate research tool, argued Alan Ginsburg, Mr. Baker’s superior in the office of planning, budget, and evaluation. The issue, he said, is not whether bilingual education is effective “on average, but are there programs that might be appropriate in one community, but not in another? The averaging effect of meta-analysis is quite limiting.’'

A version of this article appeared in the April 01, 1987 edition of Education Week as Debate Over Effectiveness Has Shaped Federal Policy