Fifty states will be struggling with accountability requirements, and they have been given very little guidance about how to proceed.
Now that President Bush has signed the “No Child Left Behind” Act of 2001, states will soon be implementing reading and mathematics tests for all students in grades 3-8 and imposing tough sanctions on schools where students do poorly. Will the strict accountability provisions included in the law promote student achievement and improve poorly performing schools? Researchers who study test-based accountability know that the new state systems are likely to produce some less desirable results. And they know some ways that states can make their systems work better.
What are the likely results? Although there are still many unanswered questions about high-stakes testing and accountability, there is a body of evidence drawn from Vermont, Florida, Kentucky, Texas, California, and other states about what will happen as states implement the new, tougher testing policies.
First, we can expect average scores on these accountability tests to rise each year for the first three or four years. Teachers and administrators at both low- and high-scoring schools will shift their instruction in ways that result in score increases. States that implemented test-based accountability have all seen their scores rise, and in some cases the increases have been dramatic.
Second, we know that to some extent these large gains will not be indicative of real gains in the knowledge and skills the tests were designed to measure, a phenomenon known as “score inflation.” There is extensive evidence that students’ scores on high-stakes tests rise faster than their scores on other standardized tests given at the same time and measuring the same subjects. It does not appear that students actually know as much as we think they do based only on the high-stakes test scores. Thus, a likely result of accountability is that the test scores themselves will be less accurate than they were prior to the addition of high stakes.
One likely result of accountability is that the test scores will be less accurate than they were prior to the addition of high stakes.
Third, we are likely to see an increase in emphasis on tested subjects and a decrease in emphasis on subjects that are not tested. When students and schools are held accountable only for reading and mathematics, class time is taken away from other subjects, such as writing, social studies, and art. Similarly, in the subjects that are tested, we should expect a decrease in emphasis on skills and content that are not covered by the tests. For example, if states adopt multiple-choice tests (which are the most economical alternative), less attention may be paid to the elements of reading and mathematics that do not lend themselves to multiple-choice testing.
Fourth, there is likely to be an increase in undesirable test-related behaviors, such as narrowly focused test- preparation activities that take time away from normal instruction, and even cheating.
Fifth, we can expect large annual fluctuations in many schools’ scores. Some schools that make the greatest gains one year will see these gains evaporate the next year. Schools whose teachers earn large bonuses one year may have stagnant scores the next, as occurred in California. This volatility in school scores comes from a variety of factors, including student mobility, measurement error, and other transitory conditions that affect test scores.
Sixth, the sanctions imposed on low-performing schools will not ensure that students in those schools are not “left behind.” The record of success on the specific sanctions imposed by the law, including staff reassignment and school takeover, is mixed. There is no guarantee that students in low-performing schools will be helped by these policies, and some risk that they will be harmed.
What should states do? A number of steps should be taken to maximize the benefits and minimize the harm done by test- based accountability. The following recommendations are not exhaustive, but they address the major concerns we’ve raised.
As a first step, states need to monitor the extent of score inflation. The amount of inflation is likely to depend on the specific features of each state’s testing program (for example, whether the same test items are used year after year). States are required to participate in the National Assessment of Educational Progress testing in grades 4 and 8 every other year, which provides a starting point for examining score inflation. States need to establish a plan for studying the NAEP results and interpreting them at the state level, and they need to consider supplementing NAEP with alternative measures in other subjects and at other grade levels.
States need to create student-information systems that enable them to link the test scores of individual students over time.
States should consider expanding “what counts” in the state accountability system to include more than just reading and math. This could be done by testing other subjects; the overall testing burden could be limited by varying subjects and grade levels over time, and by using sampling approaches that do not require every student to take every test or answer every question. States should also include measures of what content is taught and how it is taught. This information reveals otherwise-hidden shifts in practice while sending signals that other subjects are important.
As a basis for doing more sensitive analyses, states need to create student- information systems that enable them to link the test scores of individual students over time. Such data will enable states to track individual student progress, whether a student remains in the same school or transfers. This type of data is especially important for understanding what happens to students in low-performing schools.
To help ensure that rewards and sanctions reflect real changes in student achievement, states should base rewards and sanctions on changes in biannual averages in scores, rather than on single-year changes. Another promising alternative to year-to-year comparisons of school-average scores is to adopt value-added approaches in which students are compared against their own prior scores.
Finally, states should monitor the progress and practices of schools that are subject to interventions, including staff reassignment or takeover, to ensure that these changes are resulting in better learning environments for children.
Although the new federal law has many attractive features, it contains inadequate provisions for review and improvement. To ensure that no child is left behind and to make test- based accountability work better, we need to study what states do and how well they succeed. Fifty states will be struggling with these requirements, and they have been given very little guidance about how to proceed.
One of the good features of the new law is the requirement that states promote instructional methods that are scientifically based—that is, methods that have been evaluated and have evidence of success. We believe this same emphasis on research should be applied to the law itself. Test-based accountability will work better if we acknowledge how little we know about it, if the federal government devotes appropriate resources to studying it, and if states make ongoing efforts to improve it.
Brian Stecher is a senior social scientist in the education program at rand in Santa Monica, Calif. He is also a member of the technical-design group advising the California Department of Education on the development of that state’s accountability system. Laura Hamilton is a behavioral scientist at RAND and a co-director of the RAND/Spencer postdoctoral program in education policy.
A version of this article appeared in the February 20, 2002 edition of Education Week as Test-Based Accountability: Making It Work Better