The New York City board of education and CTB/McGraw-Hill, one of the nation’s largest test-makers, said that more than 60,000 6th graders in the city received higher scores on a reading test last year than they should have. But each side offered differing theories on how it happened.
In a joint statement issued June 8, the city’s schools chancellor, Harold O. Levy, and the president of CTB/McGraw-Hill, David M. Taggart, said the results of the April 2000 reading tests that the city and the test-maker designed together were “overstated.”
The problems stemmed from “a combination of factors related to test design” and had no effect on the accuracy of students’ answers, they said. In an interview later, Mr. Taggart speculated about a variety of factors that could have worked together to create an overly high aggregate result.
Students simply might have delivered an exceptionally strong performance, he said. Some students might have been familiar with some of the questions because 12 of the 50 multiple-choice questions had been used in a 1999 version of the test, he said. In addition, he suggested, the results could have been affected if the administration of the test was inconsistent from one school to another.
“Were those scores wrong? No, they were not wrong,” Mr. Taggart said. “What may have happened is that those scores may overstate to some extent the level of skills those students had. But it’s important to remember those students were improving from 1999 to 2000. There’s no question about that. The question is how much.”
The executive director of assessment and accountability for the 1.1 million-student New York school system had a different view. Robert Tobias theorized that the group’s scores might have been inflated because the test was deemed to be more difficult than it actually was, allowing students to receive more points for answering fewer of the questions correctly.
On the 2000 test, he said, students were judged to meet grade-level standards if they answered 70 percent of the questions correctly. On the 2001 test, students needed to answer 80 percent to achieve the same benchmark.
Mr. Tobias disagreed that familiarity with questions used in a previous year contributed to the problem, noting that students scored about the same on the 12 repeated questions as they did on the new ones.
Warning Signs
Mr. Tobias and CTB/McGraw-Hill officials became suspicious right away, when the 2000 test results showed that 15 percent more students had achieved grade-level standards than had done so in 1999—a gain of more than double what is normally expected, Mr. Tobias said. But the test-maker stood by the results, and the district agreed to wait for the next year’s scores to draw conclusions.
But when the 2001 results showed a 13 percent drop in the number of students meeting grade-level standards, that finding bolstered officials’ suspicions that the 2000 scores were too high. In 1999, 28.6 percent of 6th graders had met the standard; the next year, the figure jumped to 43.9 percent; and this year, it dropped to 30.9 percent—a spiky performance line that “really got our attention,” Mr. Tobias said.
The problem brought back memories of a foul-up with another CTB/McGraw-Hill test in 1999, when 9,000 students were sent to summer school because of an incorrectly scored reading test.
Mr. Tobias said he was confident that no students were promoted improperly as a result of the 2000 glitch because test scores are considered in combination with attendance and classroom performance in making such decisions.
Mr. Taggart and Chancellor Levy said in their statement that the 2001 results were accurate because the questions raised by the 2000 test were addressed in this year’s version. Mr. Tobias and Mr. Taggart explained that the 2001 test contained all new items and greatly increased the portion of questions that have been tested on a national sample of students.
Mr. Tobias said that CTB/McGraw-Hill had done a good job on many of the city’s tests, but that the city plans to seek out other makers of reading assessments. Those plans were made before the current problem surfaced. But in issuing a new contract, Mr. Tobias said, “part of the consideration is the track record of the company.”
“Obviously,” he said, “any anomaly that occurs, particularly those that aren’t explained, shakes your credibility in the products and services you are receiving.”