President Bush has made annual testing of 3rd through 8th graders a centerpiece of his education agenda, saying such reading and math tests provide a “vital diagnostic tool for schools.” But in states that already engage in annual testing in those grades, educators are divided over its usefulness.
“It’s very powerful,” said Rosaena Garza, the director of academics for the 40,000-student district in Corpus Christi, Texas, referring to her state’s assessment system. “A principal has a very good picture of his or her campus. If you’re seeing a pattern in a particular teacher’s classroom, then you can begin to look at what’s happening in that classroom.”
But Linda L. Clark, the director of instruction for the 25,000- student Joint School District No. 2 in Meridian, Idaho, finds that the test her state gives, the Iowa Tests of Basic Skills, is useful mainly as an instrument for “ranking and sorting students.”
“Our question of the state board was, ‘How many times during a student’s career do they need to be ranked and sorted?’ ” Ms. Clark said.
The different attitudes can be traced in large measure to the types of tests states give, the details they provide schools about test results, and the timeliness of the information. Mr. Bush has modeled his proposal on the Texas system, which provides schools with detailed, grade- by-grade information about how students perform according to the state’s standards. Yet few state assessment programs resemble that of former Gov. Bush’s Texas.
The Texas Assessment of Academic Skills is a “criterion referenced” test, meaning it was designed specifically to measure achievement against the state’s standards in reading and mathematics in grades 3-8 and at the high school level. It also measures achievement in writing, science, and social studies in some of those grades.
Texas schools receive information on whether each grade level and each student have mastered a given objective, such as “summarization,” including the number of test items answered correctly. The state holds schools accountable for achieving minimum passing rates for all students, poor and minority students included.
“You can look and see how the kids did before they went into a teacher’s classroom and how they did when they left, and it’s all broken down by objectives,” said Deborah Scates, the principal of the 500-student Paul R. Haas Middle School in Corpus Christi.
“It points out strengths, too,” she continued. “Last year, I had a couple of 7th grade teachers, who I discovered are wonderful teaching algebraic functions, which is a very difficult concept. That helps me know where I need to place teachers in the school, so they can teach to their strengths.”
An Imperfect Match
Many educators elsewhere say that while their states’ test results are useful up to a point, they don’t provide enough information to guide instruction.
In Tennessee, for example, students in grades 3-8 take the multiple-choice component of the TerraNova-2nd Edition, created by commercial test-maker CTB/McGraw-Hill. But Earl H. Wiman, the principal of the 522-student Alexander Elementary School in Jackson, said schools typically get the data too late to be useful. “We took the test last April.We got information in October. “
In addition, Mr. Wiman contends, the test does not reflect the state’s academic standards closely enough to help focus instruction.
“If we’re going to hold schools accountable, we need to very clearly identify for teachers and schools what needs to be taught, and we need to very clearly identify for teachers how that’s going to be tested,” he said. ‘That link is just not there in so many different testing programs.”
TerraNova is a norm-referenced test, meaning it was designed primarily to compare the performance of students with that of their peers nationally, and not to measure how they perform based on a state’s own standards. States and districts can pay more to help customize the exam.
California currently rates schools based chiefly on the Stanford Achievement Test-9th Edition, produced by Harcourt Educational Measurement, another norm-referenced test given every year in grades 2-11.
“If you take the standards that we have in this state and you align them with that test, you’re not going to get a perfect alignment,” said Charles G. Jackson, who directs instructional-support services for District G, a 62,000-student subdistrict of the Los Angeles Unified School District. “So even the specific information that we get back may not match what the efforts need to be at the school level.”
Ms. Clark’s Idaho district has received a waiver to administer the state’s norm-referenced test less often because the district believes its own testing program is more useful. Although the state administers the ITBS— produced by Riverside Publishing Company—in grades 3-8, the district administers the test only in grades 3 and 7.
“What we’re interested in having is a comprehensive assessment program, based on multiple measures, that measures a student’s growth toward the standards,” Ms. Clark said, “and in that kind of a system, there’s a lot of power, in terms of being able to plan instruction.”
As the backbone of its testing system, Ms. Clark’s district uses the Achievement Level Tests produced by the Portland, Ore.-based Northwest Evaluation Association, a nonprofit group. The district can create tests that reflect its standards from a bank of more than 15,000 test items and use the results to measure the growth of individual students over time.
Homing In on Data
One of the primary arguments for testing each child each year is to be able to track year-to-year growth. Some contend that such information provides a fairer way to judge schools, based on how much schools “add value” to a student’s knowledge and skills.
Tennessee, for example, uses the results from the TerraNova to focus on the gains students make over time. William L. Sanders, a research fellow at the University of North Carolina, who helped develop the approach, is a strong advocate of so-called value-added testing. If states test in only a few grades, he argued, it’s hard to pinpoint where problems lie.
“In terms of thinking about getting positive diagnostic information, I see no way to do it without having at least annual testing,” said Mr. Sanders, who also directs the value-added assessment and research center at the Cary, N.C.-based SAS Institute, a for-profit software company.
The 50,000-student Minneapolis school district also uses a value-added approach. “We like the notion of annual testing,” said David Heistad, the director of research and evaluation for the district. “We can home in on which grade levels at which schools are producing the greatest gains and beating the odds, and we can also find relative weaknesses within a school. If you test too infrequently, you can never localize the information to a particular grade level or classroom.”
But, he added, “the quality of the assessment is the key for me. We would continue to report out our local measures, even if the state went to annual testing, simply because ours are so clearly aligned with our standards.”
‘Too Much Hullabaloo’
Mr. Sanders maintains that if a test is highly—but not perfectly—correlated with a state’s curriculum objectives, measures the progress of students at both the high end and the low end of the performance scale, and is reliable, it doesn’t matter whether it’s norm-referenced or criterion-referenced. One criticism of the Texas tests is that they do not measure the progress of students at the high end of the scale.
“There is too much hullabaloo made over distinctions among tests,” Mr. Sanders said. “When you’re using these things as indicators of student progress over time, then a lot of these distinctions blur.”
Florida is one of the few states that, beginning this year, will give students in grades 3-8 both a norm-referenced test in reading and math and a criterion-referenced test designed to measure its own standards.
The results of the latter will be used to gauge whether students are mastering a year’s worth of learning in a year’s worth of time, said JoAnn Carrin, a spokeswoman for the Florida education department. What’s more, she said, the tests will be tied to the state’s standards, “which is the real important part of the whole program.”
Mario J. Crocetti, the principal of the 1,500-student Wellington Landing Community Middle School in Palm Beach County, Fla., said, until now, the usefulness of the state’s testing system has been limited because “all we’ve been able to do is compare this year’s 8th grade class to last year’s 8th grade class. That’s been a real stumbling block.”
Back to the Future?
Until Congress rewrote the law, the federal Title I program for disadvantaged students (known at the time as Chapter 1) required districts to assess the achievement of students in the program annually in grades 2-12, using a national norm-referenced test. But in 1994, the law was rewritten to require states and districts to use tests that actually reflected a state’s standards and that were the same tests used to measure the performance of other students in the state. The tests must be given at least once annually in grades 3 through 5, 6 through 9, and 10 through 12.
In part, Congress was reacting to a 1993 evaluation of the program that found norm-referenced tests provided objective, reliable information for relatively little time and money, but were having some undesirable consequences in schools.
In particular, noted the “Final Report of the National Assessment of the Chapter 1 Program,” because the tests were designed to be independent of the local curriculum, they could not give teachers much help in pinpointing the parts of the curriculum in which a student needed more work. In addition, the report found, the reliance on multiple-choice questions was encouraging teachers to spend too much time on test-taking skills or low-level basic skills instead of on more challenging academic content.
The report also raised concerns, at a time when states were instituting challenging academic standards for all students, that the Chapter 1 program was encouraging children in high- poverty schools to be held to lower expectations.
It wasn’t until this year that the U.S. Department of Education began holding states responsible for having in place new testing programs that meet the Title I requirements.
Now, President Bush wants states to return to annual, grade-by-grade testing as a means of strengthening accountability for the use of federal Title I aid. (“States Lagging Behind on Title I Rules, Ed. Dept. Says,” Jan. 31, 2000.)
A Bush adviser said last week that, ideally, the administration wants states to use standards-based exams, but that they could use an off-the-shelf, norm-referenced test if they chose. The same adviser said the administration could propose as much as $100 million to help states write, and perhaps administer, such tests the first few years.
Two central concerns are the costs of creating new exams and the capacity of the testing industry to handle the demand. In the past five years, state testing expenditures have almost tripled, from $141 million to $390 million, according to Achieve, a Cambridge, Mass.-based nonprofit group that promotes standards- based initiatives. One study estimated the average cost of multiple-choice tests at $17.50 per student in 1998 dollars, while tests with any “performance” section, such as essays or short-answer questions, averaged about $28 per exam.
Many states test in only a few grades because of the cost and time entailed in more extensive testing, said Brian Gong, the associate director of the National Center for the Improvement of Educational Assessment, a nonprofit organization in Dover, N.H. They also feel that less frequent testing is adequate to hold schools accountable for whether students are meeting standards.
“Development is a small part of the investment,” Mr. Gong added. “It’s the every-year operational cost that is driving most of these decisions, and testing time. If you test in a number of subjects every year, then what you end up having is machine-scoreable, multiple-choice tests, and a lot of states say, philosophically, we don’t think that will influence instruction the way we want.”
To lessen costs, Mr. Sanders, the expert on value-added testing, suggested that states with similar standards form “buying cooperatives” that would enable them to pool their resources for the construction and maintenance of testing programs.
But he acknowledged that, in the short term, the three big testing companies, which handle the bulk of state contracts—CTB/McGraw-Hill, Riverside Publishing Co., and Harcourt Educational Measurement—will have trouble keeping up with demand if President Bush’s proposal is enacted.
“There are definitely snafus because they’re running at more than full capacity,” Mr. Sanders said. “That problem will get exacerbated for a while, but I think it’s like the utility problem on the West Coast. At some point in time, you’ll have more power generation, and at some point in time, you will have more capacity to do this.”
The real problem, he and others say, is that even with the best assessments, too few schools use the test results for diagnostic purposes or have staff members who are trained to do so. “If I knew how to accelerate people learning to use the data in positive ways, I would do it,” Mr. Sanders said. “That’s the frustrating part.”
Meanwhile, even educators who make good use of assessment data worry about more testing. “That’s the biggest complaint I hear from teachers,” said Laurie B. Abeel, the assessment coordinator for the 9,600-student Fauquier County schools in Virginia. “I’m testing all the time, and when do I teach?”
A version of this article appeared in the February 21, 2001 edition of Education Week as Usefulness of Annual Testing Varies by State