Reporting of Software Product-Testing Stirs Debate

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

With the results of a forthcoming federal study of educational software still under wraps, questions are arising about how it has been conducted—particularly the government’s decision not to disclose individual performance results for the 15 computerized curriculum packages being studied.

While the companies involved will receive results for their own products, the public will see only aggregated findings for the four categories of programs examined in the $10 million study. Those categories are 1st grade early reading, 4th grade reading comprehension, 6th grade pre-algebra, and 9th grade algebra.

The study, authorized by the federal No Child Left Behind Act, stands out as “one of only a few national-level, randomized-trial evaluations that the federal government has ever done,” said Michael L. Kamil, a Stanford University education researcher who is on the study’s technical-advisory group.

Officials picked 17 computer-based reading and math products to be evaluated in the 2004-05 school year as part of a major federal research project. Two were dropped, leaving 15 products by 11 companies.

GRADE 1 EARLY READING
Academy of Reading
Autoskill International Inc.
Destination Reading
Riverdeep Inc.
The Waterford Early Reading Program
Waterford Institute
Headsprout
Headsprout Inc.
Plato FOCUS
PLATO Learning Inc.
GRADE 4 READING COMPREHENSION
Academy of Reading
Autoskill International Inc.
Read 180
Scholastic Inc.
KnowledgeBox
Pearson Digital Learning
Leaptrack
Leapfrog Schoolhouse
GRADE 6 PRE-ALGEBRA
SmartMath
Computaught Inc.
Achieve Now
PLATO Learning Inc.
Larson Pre-Algebra
Meridian Creative Group
GRADE 9 ALGEBRA
Cognitive Tutor
Carnegie Learning Inc.
Algebra
PLATO Learning Inc.
Larson Algebra
Meridian Creative Group

SOURCE: Mathematica Policy Research Inc.

The study was designed by a panel with “impeccable credentials in methodology,” said Mr. Kamil, who has read a draft of the study but said he could not discuss the findings. “By my reading, the results are very important and should in fact influence decisions about what to do,” he said last week.

Federal officials also say the combined results will give valuable information about the impact of different types of curriculum packages, some of which blend computer-based exercises with textbooks and other activities.

But some researchers and educators say the aggregation of the results for the different products was a lost opportunity to produce useful information on which products to buy or to avoid.

Aggregating may be “really cool from a theoretical perspective,” said Arie van der Ploeg, a researcher at Learning Point Associates, a research and policy group in Naperville, Ill., but “to some extent, educators are shoppers; they are saying, ‘Right now, I need tools more than I need theory.’ ”

Formulated in 2003, the study was one of the first commissioned by the U.S. Department of Education’s Institute for Education Sciences, or IES, to follow the so-called research “gold standard” of using random assignment to test the effectiveness of educational programs or approaches, often known as “interventions.”

The study has tested 15 technology-based products in 132 schools, which were geographically diverse and generally had students with lower achievement and deeper poverty than average, according to Mark Dynarski, the lead researcher for the study at Mathematica Policy Research Inc., an independent, for-profit research organization in Princeton, N.J. Also taking part is the Menlo Park, Calif.-based SRI International Inc.

More than 9,000 students and 439 teachers were in either the control or treatment group. The 33 school districts chose from among several interventions the ones that their schools would test, but teachers were randomly assigned to use them or to use their regular methods alone. Near the start and the end of the 2004-05 school year, students took standardized tests to gauge their progress.

The report is still under review, but officials at the IES said it would likely be out by the end of summer. The IES and Mathematica extended the study over the 2005-06 school year, an option in the federal contract.

In the second year, the results will be broken out product by product, Mr. Dynarski said.

Choice Seemed ‘Sensible’

Audrey J. Pendleton, the IES project manager for the study, said it represents the first time that the Education Department’s research arm has set out to evaluate specific commercial products.

The decision to aggregate the results was made in the summer of 2003 as a way to encourage companies to volunteer to take part in a competitive selection process, which favored products with a previous evidence of effectiveness. Participation required companies to make significant contributions of products and teacher training.

“There was concern on the advisory panel that if we selected products and the results were not good, that companies would be very unhappy and say, ‘Why did you chose our products?,’ and there would be lawsuits,” Ms. Pendleton said.

While reporting results by “groups of like products … would lower the risk for the publishers,” she said, “there was concern on the part of the publishers that they may not be grouped with products they felt were comparable for their product.”

Mr. Dynarski, who led the study design team, recalled of the decision: “At the time, it seemed quite sensible. We were really asking [companies] to jump into something that was entirely unpredictable. There were no schools, no teachers identified, and we were asking them, ‘Do you want to be studied?’ ”

Moreover, the portion of the No Child Left Behind law that mandated the study gave wide latitude. “We had a feeling we didn’t have to report on individual products to meet the mandate,” Ms. Pendleton said.

Still, the decision not to publish individual results for Year 1 has puzzled some observers.

Jerry D. Weast, the superintendent of the 139,000-student Montgomery County, Md., school district, said such a soft-focus approach would make it hard to use the results to make changes.

“I can’t imagine telling my teachers, ‘We know what may work, but we’re not going to tell,’ ” he said. “That’s not going to hit the troops very well in these high-pressure times.”

In addition to gauging the effectiveness of the programs, the study aims to describe the conditions under which they were used. To that end, observers from SRI International and Mathematica studied the quality of implementation, including teachers’ behavior and the amount of time the students spent on computers. But in the first-year report, the implementation descriptions will also be blended, shielding the products’ identity.

Mr. van der Ploeg, whose group advises school districts, said that giving educators advice from such general conclusions is like saying, “ ‘Eat all your vegetables.’ It doesn’t help you pick and choose from the market basket.”

Although the decision by the Institute for Education Sciences to leave off product names ostensibly was to encourage companies to participate, it hasn’t necessarily made the companies happy. A researcher who is a developer of Cognitive Tutor, one of three programs being tested for 9th grade algebra, said that aggregating results makes it more likely that the positive will cancel out the negative, yielding inconclusive findings.

“You end up with a study [in which] it’s not clear what it’s studying,” said Steven Ritter, the senior vice president of research and development at Carnegie Learning Inc., the Pittsburgh-based company that makes the program. “In a sense, our results, and how the results are perceived for our product, are dependent on how the other products perform as well,” he added. “That’s an uncomfortable position to be in.”

Thinking Changed

Companies do have qualms about taking part in large-scale independent studies, said Karen Billings, the vice president of the education division of the Software & Information Industry Association, a Washington-based group that is active in policy discussions on federal research in education technology.

With restricted access to test sites and “rigid start and stop dates” for trials, she said, companies have little ability to address implementation problems, such as making sure software is installed on correct hardware, that educators “buy in” to using it, and that teacher professional development is successful.

Despite those risks, Ms. Billings said, “there is a great deal of value when they can show achievement gains, when someone else is in charge of the study and, particularly, the implementation phase.”

As it happened, companies flocked to the study in 2003, nominating 163 products, of which 17 from a dozen companies were selected. The winners made marketing hay with the news of their selection, although for various reasons two products later were dropped.

The IES plans to give the companies the study data on their own programs, Ms. Pendleton said. The companies may release their data, subject to limits to protect the confidentiality of teachers and students.

Officials at the IES underscored that if the study were starting today, data on individual products would be reported from the get-go.

In previous years, it seemed enough to give educators general findings on strategies and products for teaching reading and math, Ms. Pendleton said.

“Since that time, our thinking has evolved, both here at IES and among developers and publishers and in districts,” she said. “People want to know very explicitly about particular products.”

Andrew Trotter

Andrew Trotter was an assistant editor for Education Week.