When the superintendent in Memphis, Tenn., announced he was scuttling that district’s long-running effort to install schoolwide improvement programs in every school in the city, the decision seemed unusual enough to merit national attention.
But the July closing of the city’s closely watched experiment was just the latest in a string of setbacks in the nationwide movement known as comprehensive or “whole school” reform.
Since 1998, districts in San Antonio and Miami-Dade County have abandoned some efforts to adopt well-known, “off the shelf” improvement models on a large scale. Early reports from New Jersey, where 30 poor districts are under a 1998 court order to adopt schoolwide improvement models, also suggest that implementation in that state, while still on track, is running into obstacles and pockets of resistance.
And a report by the RAND Corp., published earlier this year, suggests that in districts that are trying to “scale up” their schoolwide reform programs, such efforts may be producing significant gains in only about half the schools that try them.
Of the 163 schools that the think tank’s researchers had tracked over two to three years, only about half made bigger gains in mathematics and reading achievement than their districts did overall.
Experts say those developments, taken together, suggest that the movement for whole-school improvement could be entering a new, more challenging phase.
“Implementation is much harder than any of us expected,” said Henry M. Levin, who founded an improvement model known as Accelerated Schools. “I think this is probably just a long, slow process, and too much was claimed for it too soon.”
Industry ‘Shakeout’
The movement reflects the long-held recognition that piecemeal attempts to improve schooling by putting in a reading program here or a new management strategy there were not working— especially in some of the nation’s poorest and lowest-achieving schools. To improve learning for all of the children under their care, proponents of the schoolwide approach argued, schools and districts had to come up with more coherent programs that could deal with all aspects of schooling.
But rather than start from scratch to create such programs, the thinking went, schools might be better off trying some models that had already accumulated successful records.
Often-cited examples of such programs included Success for All, a reading-based program for elementary schools devised by researchers at Johns Hopkins University in Baltimore; Mr. Levin’s Accelerated Schools program; and Direct Instruction, an approach developed in the 1960s by Siegfried Engelmann.
Nurtured since 1992 with $100 million from New American Schools, a private, nonprofit organization in Arlington, Va., program developers began to move their models out into more and more schools. And, with Memphis at the head of the pack, a few districts began experimenting with the programs on an even larger scale.
In the 118,000-student Memphis district, all 163 schools were required to adopt an improvement program of their own choosing.
Nationwide, the movement got an added boost in 1997, when Congress approved the Comprehensive School Reform Demonstration Program, a grant program aimed at helping mostly poor schools put in place research-proven improvement models. Since then, federal lawmakers have sunk $480 million into the program, which is underwriting improvement efforts in around 2,000 schools across the country.
“Comprehensive school reform has really changed the way that educators come around the table,” said Karen Hinton, the vice president for external affairs for New American Schools. “It’s not a faddish trend.”
That’s why proponents suggest that the setbacks they see now may be more of a transitional phase than a sign of failure.
“The situation, as I see it, is typical of things that happen probably with any innovation in any field, where at the outset there’s a great deal of enthusiasm, and then people get realistic about what the methods can and cannot do and there’s a shakeout,” said Robert E. Slavin, a co-developer of the Success for All model used by 1,800 schools. “The stronger programs continue, and then it becomes part of the landscape.”
The analogy he draws is to the Internet industry, which saw its stock prices drop as the field underwent a shakeout.
“Is that failing? Of course not,” said Mr. Slavin, who is also co-director of the Center for Research on Students Placed at Risk, a federally funded research center based at Johns Hopkins. “It’s actually the stabilization of something that is a major change for society.”
To longtime critics of the whole-school approach, however, the problems cropping up are proof of the movement’s wrongheadedness.
“The whole edifice that was constructed around the notion of schoolwide reform models and approaches being better was not correct,” said Stanley Pogrow, an associate professor of education at the University of Arizona in Tucson. “I’m not saying we shouldn’t have experimented with them. I’m saying they should not have been, to the exclusion of almost everything else, what the [U.S.] Department of Education pushed and promoted and funded research around.”
Flawed Studies?
Mr. Pogrow’s criticism has been characterized as sour grapes because he markets a more narrowly focused, computer-based program for teaching critical-thinking skills that has suffered in the rush to embrace broader strategies.
But he, like other critics, also points to a growing body of research suggesting that some of the studies favoring schoolwide programs such as Success for All have been flawed, and that the programs are not meeting their initial promise.
Early studies of Memphis’ programs suggested that, from 1995 to 1999, students in the schools that were restructuring were making greater achievement gains than students in schools with demographically similar enrollments that had not yet undertaken the changes. That study, produced by the University of Memphis, spurred other districts, such as Atlanta, to follow Memphis’ example.
However, when district researchers conducted their own study at the request of their new superintendent, Johnnie B. Watson, they came to a more pessimistic conclusion. Using different study methods, they found that, over the first five years of the initiative, students’ test scores were stagnant or declining in mathematics, reading, and English. (“Memphis Scraps Redesign Models in All Its Schools,” April 18, 2001.)
In a new critique paid for by New American Schools, though, an East Tennessee State University professor expresses some skepticism about the district’s findings. James E. McLean, an education professor at the university’s Johnson City campus, said the district study was flawed because the researchers failed to use comparison groups and because they measured progress in terms of changes in the percentages of students who scored above the 50th percentile on tests.
“While considering the percentage of students above the ‘national average’ sounds impressive,” he said, “it may not be appropriate.” The reason: The methodology fails to pick up subtler changes in students’ overall average achievement, particularly for students who may have started out with bottom-hugging scores but could not quite pass the “high jump” the researchers set out for them.
Schools'-Eye Views
In a similar vein, supporters of comprehensive improvement projects criticize the RAND study for measuring the progress of students in schools undergoing restructuring against the averages for their districts. That methodology poses problems, they say, because the schools trying to incorporate new strategies were among the poorest in their districts. (“RAND Finds Mixed Results for School Reform Models,” April 18, 2001.)
They also contend those schools turned out to be unrepresentative of schools nationwide that were getting support from New American Schools to “scale up” schoolwide changes, because they were concentrated within a few states or less successful reform models.
The problem with the RAND study, as with most evaluations of such improvement initiatives, said Steven M. Ross, the University of Memphis researcher who conducted the first studies in that district, is that they draw on data collected at the school level, which is a cruder measure than data collected on individual students.
“Hardly any program in the history of education would show sustainable gains with school-level data,” he said. “Educational research has to be extremely sensitive to factor out extraneous differences.”
Mark Berends, a co-author of the RAND study, does not disagree.
“Because there is so much variation in implementation even in the same school, it’s very difficult to look for achievement effects,” he said.
Researchers conducting such evaluations also disagree over how to address the fact that in many inner-city schools, 20 percent to 40 percent of students might be new to the school and new to the programs under study. One school of thought contends that it’s unfair to the programs to include achievement data on those students; another argues for including those students because schools, too, have to include them and adjust their teaching accordingly.
The bottom line, says Mr. Berends, is that researchers can’t yet say for sure whether large-scale efforts to install comprehensive improvement programs in schools actually work.
“There’s just very little research to date on the variety of models out there in different settings that show deep implementations and sustained implementations, so that we can even think about achievement in student learning,” he said.
The hope is that the picture will become clearer over the next five years through some of the newer research efforts being underwritten by the federal demonstration program. Last year, the Department of Education awarded $21 million in grants to six research groups to study the progress and effectiveness of federally financed schoolwide reforms.
In the meantime, the growing pressure to show research-backed results is producing a steady trickle of studies—most of them positive—on individual school reform designs, such as Accelerated Schools, Success for All, James P. Comer’s School Development Program, E.D. Hirsch Jr.'s Core Knowledge approach, and America’s Choice.
“There’s evidence across all the reports that a whole bunch of models can have a positive impact on student achievement,” said Mary Anne Schmitt, the president and chief executive officer of New American Schools. “But there’s also evidence that we can’t guarantee that any given model will have an impact on student achievement.”
What her organization and others have learned, however, from all of the studies is what kinds of conditions must be in place for comprehensive improvement efforts to succeed. Districts that look to New American Schools for support now have to put together portfolios showing that they’re willing to stay with the program for the long haul, provide the necessary teacher training and financial support, and meet other criteria that studies say may be important to sustaining schoolwide programs.
“We’ve all learned a lot about how to create those conditions so we can increase the probability of success to something much greater in the future,” Ms. Schmitt said.
No One Size Fits All
In fact, some experts contend that the setbacks the movement is experiencing now have little to do with anything researchers have to say about the reform models’ overall efficacy. Rather, the problems reflect management missteps, political opposition, outside pressure, and practical impediments that bedevil school systems on a day-to-day basis.
One of the biggest mistakes, many proponents of schoolwide improvement programs say, may have been attempting to impose reform models on a large scale, much as Memphis, the New Jersey districts, and Miami-Dade County have done or are trying to do to one degree or another.
“I think for schoolwide reform to really work, you have to have the buy-in of the entire staff,” said Nereida Santa-Cruz, the assistant superintendent for curriculum support services in the 361,000-student Miami- Dade system.
Of the 45 schools in that district that began working with Success for All, only seven are still using the program.
“We were not successful with Success For All,” Ms. Santa-Cruz said. “For whatever reason, it was not a program for which we could show the enormous increases promised by the developer.”
Some of the resistance to the improvement models in New Jersey has come from schools that were doing well on their own before the court order, according to Bari Anhalt Ehrlichson, an assistant professor of policy at Rutgers University in New Brunswick, N.J., who has been following the attempts at whole-school reform in that state.
“When people say resistance, you often think of the lazy teacher who doesn’t want to change, but in some of these cases, these were protective faculties who were excited about what was already going on in their schools and had the data to back it up,” she said.
A more garden-variety problem plaguing many schoolwide reforms is a change of leadership at the top. New principals and new superintendents are often more eager to make their own mark on a school system than they are to continue initiatives their predecessors launched. Without continuing support from the top, whether financial or otherwise, many schoolwide programs tend to wither away.
“For all the reforms, that’s really the death knell,” said Mr. Levin, a professor of economics and education at Teachers College, Columbia University.
Ms. Ehrlichson says turnover at the staff level can also hinder schools’ progress. In some of the schools she has been tracking for three years, only 10 percent of the staff members have been there from the beginning.
“Developers almost have to offer year-one training every year,” she said. Program developers also complain they are given limited time to work with teachers to try to bring about deep, lasting changes in their instruction.
Another difficulty is that policymakers tend to show little patience for long-term change—a problem when many program creators say their models take two to three years or longer to show results.
The pressure to perform has also heightened in many states as policymakers rely increasingly on standardized tests to determine which students can graduate on time and which schools qualify for rewards or punishments. That can be a problem, experts say, if the tests are not compatible with the curricula and teaching strategies the restructuring programs are using.
“I’ve seen many schools end up dropping their efforts because, while they might see some improvement on other measures, they don’t see it on state standardized tests,” said Amanda L. Datnow. An assistant professor in the department of theory and policy studies at the University of Toronto’s Ontario Institute for Studies in Education, she has been tracking several such reform projects across the United States.
But, she said, it’s too soon to count out such models. “I think there’s some hope that, under the right conditions and, if used for the right reasons,” she said, “these models can produce some improvement for student learning.”