President Bush’s plan to test every student in grades 3-8 annually in reading and mathematics has set off a fierce debate about what those tests should look like.
The disagreement pits America’s history of state and local control of public education against the desire for greater uniformity in reporting student achievement. And its outcome is sure to be the subject of intense negotiations and a major test of the Bush administration’s commitment to its education agenda.
At issue is the amount of flexibility that states should have in designing their testing systems. The National Governors’ Association and the Council of Chief State School Officers, among other groups, argue that states should be allowed to use a mix of state and local assessments to measure annual student progress toward state standards. Anything else, they assert, would constitute an undue federal intrusion into state policymaking, derail states’ education efforts, and cost too much.
But others, including influential members of the business community, say that such a mix of tests in different grades is essentially what Americans have now. Such tests, they contend, do not yield the kind of comparable information that would let parents know how each child is progressing or how schools are performing from year to year.
“My concern is that if we allow for a hodge-podge of tests that are not comparable from grade to grade and are not administered on a statewide basis, then we’ll have no real understanding of whether these federal funds are being used in a way that’s benefiting children,” said Krista Kafer, an education policy analyst for the Washington-based Heritage Foundation, a conservative think tank. “If the testing is not done in a meaningful way, we’re really no better off than we are now.”
Under the administration’s original plan, each state would select or create tests for grades 3-8 that were aligned with its academic standards and that produced comparable data from school to school and grade to grade within a state.
Mr. Bush, who oversaw such a program in Texas, argues that the annual testing is needed for greater accountability in education, as well as to let parents and teachers gauge individual student achievement every year.
The federal government would financially reward states that made the most progress on their state tests, as confirmed by the National Assessment of Educational Progress, a federally mandated testing program. And it would cut aid to administer federal education programs for those that failed to improve. States, districts, and schools also would be responsible for improving test results and closing the achievement gap between groups of students.
‘Reproduce the Status Quo’
But the proposal is slowly being whittled away on Capitol Hill. The Senate bill, S 1, which would reauthorize the Elementary and Secondary Education Act, calls for a “system of high-quality, yearly student assessments,” but does not specify that the tests be comparable across districts or grades within a state. The House bill, HR 1, which the House education committee began marking up last week, contains similarly vague language and would allow states to confirm their progress with tests other than NAEP.
“I don’t know what the point of it is,” said Diane Ravitch, a former assistant secretary of education in the previous Bush administration. “It would, in effect, reproduce the status quo and probably add more testing, much of it redundant and very little of it comparable.
“We’ll have lots of scores,” she continued, “but if you can’t compare them to each other, what will be the value to be gained? A lot of money will be spent to produce tons of data for no purpose.”
Gordon M. Ambach, the executive director of the Council of Chief State School Officers, argues otherwise. “The federal interest here is in accountability related to the use of federal funds,” he said. “In our judgment, to meet the federal requirement ... it is not essential to have six separate data points for reading and math for every year from virtually every schoolroom in the country.”
With the exception of Texas, Mr. Ambach noted, none of the states that scored in the top 10 in math or reading on the 1996 and 1998 state NAEP, which tests a sampling of students, currently tests every year in grades 3-8.
“Where the state is testing at all grade levels now, that’s fine,” he added."But where states like Maryland and Virginia and New York and Vermont have testing at most of those grade levels, but not all, we object to a requirement that they would have to test at those other grade levels.”
But others say states are unlikely to make changes unless compelled to do so. “Fundamentally, [the opposition takes] a status quo position that says we like the testing programs we’ve got. Don’t make us change,” said Chester E. Finn Jr., the president of the Washington-based Thomas B. Fordham Foundation and a former assistant secretary of education in the Reagan administration. “That’s the sub-theme here. It’s not even very subtle.”
"[The Senate bill] does not mandate that it has to be one way, so in that regard there’s flexibility,” said Sandy Kress, an education adviser to Mr. Bush. “But the standards are so rigorous, I think it’s going to be very tough for some of these multiple test-type systems, or proposals, to pass muster,” with the U.S. secretary of education, who must approve states’ plans. “The secretary will expect a test or tests that will allow for a uniform measurement of progress throughout the state.”
What’s the Purpose?
In part, the differences of opinion stem from a lack of clarity about the White House’s goals for annual testing in grades 3-8.
If the chief purpose is to compare the performance of individual students, schools, and districts, “then you can really only do that if you have a common kind of yardstick to measure everybody with,” said Michael J. Kolen, an education professor at the University of Iowa and a past president of the National Council of Measurement in Education.
In 1997, Congress asked the National Research Council to study the feasibility of devising a common scale that would allow test scores of individual students to be compared across existing commercial and state tests and NAEP. The NRC panel that studied the issue concluded that technically “linking” the scores of those who took different tests was “not feasible.” (“Panel Finds No Tests Comparable to Ones Clinton Espouses,” June 17, 1998.)
Paul W. Holland, who chaired the committee, cited a number of problems stemming from such comparisons: Different tests are designed to measure different, but possibly overlapping, content; may be of greater or lesser difficulty; use a different mix of test questions and format; may have greater or lesser consequences attached to test results; and may be given under slightly altered conditions. Deviations in any one of those factors, he said, can change students’ scores and the validity of drawing inferences across tests.
“There’s this dream that we could put all these tests on a common scale, and it could be done by magical statistics,” said Mr. Holland, who holds the Frederick Lord chair in measurement and statistics at the Educational Testing Service, the Princeton, N.J.-based test-maker. “When you think about what this poor miracle is being required to do, it’s actually pretty hard.”
Similarly, if the primary goal is to track the growth of individual students from grade to grade, most testing experts say, then states are better off using the same test battery in every grade.
“It’s technically conceivable to think of doing it with different tests,” said Brian Gong, the associate director of the National Center for the Improvement of Educational Assessment, a nonprofit group based in Portsmouth, N.H., “but I think that would not be a near-term project.”
Meanwhile, the Business Coalition for Excellence in Education, an alliance of nearly 80 companies and business associations, has been lobbying hard for annual tests that would be comparable across schools and grade levels.
“The purpose of testing is student improvement,” said Roberts T. Jones, the president of the National Alliance of Business, a member of the coalition. “If testing is to be useful to parents and schools and teachers and everybody else in the process, then being able to assess where a student is from year to year on a competency-learning curve, and then remediating those who need it, is an important thing.”
In contrast, if the primary purpose is to hold schools and districts accountable for their performance—as Mr. Ambach of the state chiefs’ council suggests—then using different tests in different grades or looking at performance in a handful of grades appears to pose less of a problem.
Many states now rate schools based on the performance of students in only a few grades, for example. In addition, experts suggest, if progress is measured by seeing if one year’s 3rd graders performed better than the previous year’s 3rd graders, and the same held true for 4th graders, it wouldn’t matter if different tests were used in the two grades.
Michael H. Kean, the vice president of public and governmental affairs for CTB/McGraw-Hill, one of the nation’s largest commercial-test publishers, said that although it’s not “equating” in the technical sense, states could set scores on their own tests to signal “proficiency” on state standards, and other scores on commercially written exams, or locally administered tests, that would also indicate proficiency. States could then track changes in the percentage of students who scored at the proficient level in every grade, from year to year, regardless of the test.
The idea of having a mix of state and local testing to measure the value that individual schools or districts add to students’ achievement is “not as big a deal as some folks are trying to make it,” said William L. Sanders, the manager of the educational value-added assessment service at SASS in Schools, a software company in Cary, N.C. “It does put a premium on doing the right analytical work, but it’s certainly doable.”
The American Association of Publishers, of which CTB/McGraw-Hill is a member, is one of many groups urging exactly that kind of flexibility. Those groups argue that states have spent the better part of a decade setting up testing systems that measure student performance against state standards in key grades. To design such tests in every grade, they assert, would be extremely costly and unnecessary.
The National Association of State Boards of Education, for example, estimates it could cost as much as $7 billion over seven years for states to design and administer reading and math tests each year in grades 3-8. The most conservative estimate, according to NASBE, is $2.7 billion for simpler, cheaper tests.
“The costs are staggering, but not surprising to states who have put in place comprehensive assessment systems over the past 10 years,” said Brenda L. Welburn, NASBE’s executive director. “Unfortunately, neither the president nor Congress has really considered the incredible expense states will incur from this sweeping change in assessment policies. Worse, national leaders have tried to downplay these very real, very high costs.”
The administration requested $320 million to help states pay for testing in fiscal 2002. The pending Senate education bill would authorize $400 million for test development each year, while the House bill would authorize $320 million.
NASBE based its estimates on an average cost of $25 to $125 per student for test development and $25 to $50 for test administration, based on advice from CTB-McGraw Hill.
But the Education Leaders Council, which represents eight state school chiefs, said the NASBE figures “grossly overestimate” the cost of state testing. The high end, for example, assumes that every state would use the most costly, performance- based measures in every grade and ignores the fact that some states already test reading and math in grades 3-8. A study released earlier this spring by Stateline.org, which reports on state policy issues, concluded that states are spending about $400 million in fiscal 2001 on test development and administration, based on a survey of state testing directors.
“Even the minimum NASBE estimates far exceed the actual cost of developing and administering a test in the state of Arizona,” said Arizona Superintendent of Public Instruction Lisa Graham Keegan. “The use of these numbers is highly irresponsible.”
Still, some argue that if states are required to write and administer tests in grades 3-8 every year, particularly within the short time frame that President Bush has recommended, it could push states away from high- quality assessments linked to their standards.
States that have designed more costly, ambitious assessments in three grades are unlikely to be able to afford similar testing in six grades, even with federal help, said Mr. Gong of the National Center for the Improvement of Educational Assessment. “I think this will drive the adoption of multiple-choice tests,” he said, “and I think it will be an open question about whether states retain their other tests.”
Some of the sharpest opposition to Mr. Bush’s proposal has come from conservatives within his own party, who worry that it could lead to a federally dictated curriculum, and from liberal Democrats, who are concerned that more testing would harm poor and minority children.
For example, at a recent forum sponsored by the Washington-based Cato Institute, a free-market think tank, Alfie Kohn, the author of The Case Against Standardized Testing, predicted that mandatory annual testing of the kind Mr. Bush supports would worsen the achievement gap between schools for poor and nonpoor students, by turning the former into “giant test-prep centers.”
“I don’t think that the answer is standards and testing,” said Darcy A. Olsen, the director of education and child policy for the Cato Institute. “This is blind faith. There’s no reason at all to believe that yet another test will help Johnny learn to read better.”
A version of this article appeared in the May 09, 2001 edition of Education Week as Bush Test Plan Fuels Debate Over Uniformity