Education Funding Commentary

Making a Silk Purse...

By Catherine E. Snow & Jacqueline Jones — April 25, 2001 8 min read
This is how a national system of annual student testing might work.

A centerpiece of President Bush’s education plan, currently under discussion in Congress, is the proposal for annual math and literacy tests for all children in grades 3-8. The benefit envisioned is the improvement of schools. The costs budgeted are not negligible. If adopted, the testing plan would cost $400 million, and would absorb many classroom hours that might otherwise be devoted to instruction. Thus, the proposal should be judged based on both its potential benefits and on its costs, and should be formulated to maximize the likely benefits.

Like previous large-scale attempts at education reform, this proposal could be a lever for improvement, or it could be an expensive, time-consuming, misdirected, and frustrating failure. For this initiative, as for others, the devil is in the details.

Under what conditions would annual testing actually generate educational improvements? In what form should the annual testing be implemented to achieve its desired effects? Let’s begin with some necessary preconditions.

The conditions ensuring the greatest benefits from annual testing include an enhanced public understanding of how tests work and what they tell us. The public needs to understand, for example, that tests, by themselves, cannot improve educational outcomes. They can lead to improvement only if they become a stimulus to change in the educational system—a basis for improved curricula, upgraded instruction, better professional development for teachers, and better distribution of resources.

While holding school districts, schools, and teachers accountable is only fair, the public also needs to understand how financial resources, student demographics, and teacher preparation affect a school’s performance. Since these contextual factors are usually outside a school’s control, it is not fair to ignore them in comparing school outcomes. Accountability systems can work if they give schools specified goals and undercut easy excuses for failure. But the results of accountability testing can also be misleading. If we simply compare scores across schools without taking into account change over time, schools that have shown great improvement can look bad in comparison with schools where children score higher but make less progress.

How, then, should President Bush’s annual test be implemented to maximize the stimulus to educational improvement and to minimize damaging effects? We propose that the two crucial features of an effective annual testing system would be a mechanism for using the test results to distribute instructional resources and a mechanism for minimizing both teaching to the test and likely misinterpretations of the results.

Use scores for improving instruction. When administered in the context of an ongoing program of classroom-based assessment and professional development, properly selected and properly interpreted tests can do the following: provide information about children’s performance levels; identify the children who need extra instructional attention; and identify the classrooms in which teachers need extra instructional support.

The public needs to understand, for example, that tests, by themselves, cannot improve educational outcomes.

But fulfilling these various functions requires selecting the appropriate tests, properly interpreting test results, and then actually using test results to inform instruction. Remarkably, the individuals responsible for making testing decisions typically know rather little about how to select, interpret, or use test results. We can hardly expect administering millions of tests to improve education if few educators know how to use the data.

It is a basic principle of test design that different functions require different tests. Our proposal violates this principle, in suggesting that an accountability test could also provide instructional information. We suggest that the annual test should be designed as a screen, to identify children who need help mastering the basics of math and reading. The information about the number of children who achieve scores above the cutoff, if appropriately filtered (see below), can reflect school effectiveness. At the same time, this information identifies children who need further, more diagnostic assessments that can be used to help teachers decide what sort of instruction to provide. We propose that it also be used as a basis for distributing professional- development resources according to need.

Test scores within a classroom should become the basis for allocation to that classroom of professional development and support to the teacher. Thus, classrooms in which a very high percentage of children receive scores below the acceptable level would receive more aid, in the form of instructional mentoring or coaching for the teacher, help in administering follow-up assessments designed to guide instruction, and resources for supplementary materials or extra classroom personnel. Classrooms in which only a small percentage of children scored below the cutoff would receive less aid. Of course, if tests are to be used to target instructional support, then they must be administered in such a way that the information from them is available immediately and early in the school year. Thus, we argue that an early-fall administration of these tests is highly desirable.

From data to information. Of course tests can provide data about how schools are doing. But such data do not constitute information about school performance if we just compare test scores across schools. We need to compare children’s test scores across time. Since in some urban areas 30 percent to 50 percent of students in a classroom in April may not have been there in September, a school’s average test score is based on the performance of many children who have hardly received instruction in that school setting. Particularly in high-transiency settings, a school’s average test scores reflect who showed up on the day of testing, not how much the school has taught its children.

Furthermore, the huge differences in test performance between urban and suburban schools often point to the experiences children bring with them as much as the experiences schools provide. Finally, the financial resources available to suburban schools are much greater than those available to the schools which typically score poorly. In using test scores to judge schools, we must disaggregate the impact of student mobility, school resources, and the extent to which children arrive already knowing what the school is trying to teach.

We suggest that any test’s use for school accountability purposes must be limited to data from children who have been in the school for at least a year, and that schools should be held more accountable for their longer-enrolled students. Doing this would require student identification procedures so that students’ school histories could be established (amazingly enough, many large urban districts do not currently have this capacity), and it would require tests designed to be comparable across grades 3-8. Comparing scores across differently designed tests is extremely difficult, if not impossible. If we wish to invest in accountability, we need to invest in designing tests that can give us the information we need to make sound decisions.

When to test? If testing is meant to improve instruction, then end-of-year tests are worthless. Results from tests administered in spring are not typically even available to teachers until the next fall, by which time the children whose test scores they receive have moved on. Furthermore, even if the test scores were available immediately, they would arrive too late in the school year for changes in instruction to have much effect.

Testing children does them no good unless it guides teachers in providing improved instruction.

An additional disadvantage of administering accountability assessments in the spring is that it creates both pressure and considerable opportunity to teach to the test. While President Bush may believe that teaching children to perform well on a math or a reading test is equivalent to teaching math or reading, this is simply not true. Tests, by their very design, reflect only a sample of what we want children to know. Teaching the sample is not equivalent to teaching the entire curriculum. While in very dysfunctional schools teaching to the test may be better than what goes on normally, in most schools it represents a narrowing of the curriculum and a waste of precious instructional time.

Making it work. For a system such as we propose to work, the annual screening tests selected would have to be relatively brief, standardized in administration, machine scoreable, and able to identify those children who need help in basic math and reading. If states are to choose their own tests, they would need a set of guidelines for selecting the screening test and guidance in prescribing appropriate follow-up assessments. A national test-review board might well be established to provide support in making these decisions.

If the tests are used to distribute professional development to those classrooms most in need, a coherent professional-development system, probably requiring increased funding, would be needed in every school district. Finally, as noted above, unique student identifiers that would make it possible to track individuals’ progress are needed for interpreting the data appropriately.

If a national system of annual testing is inevitable, experts in testing must be recruited to think creatively about how to make it serve both accountability and instructional needs. Teachers, principals, school board members, and the general public need information that can help them interpret test results appropriately.

The testing system must remain focused on upgrading instructional programs. Testing children does them no good unless it guides teachers in providing improved instruction, which in turn requires greatly enhanced professional development and support.

Annual tests should be one piece of an integrated system of ongoing classroom-based assessment and professional development, targeted where the need is greatest.

Catherine E. Snow is the Henry Lee Shattuck professor of education at Harvard University’s graduate school of education in Cambridge, Mass., and a member of the Board on Testing and Assessment. Jacqueline Jones is a visiting associate professor at the graduate school and a senior research scientist at the Educational Testing Service in Princeton, N.J.

Related Tags:

A version of this article appeared in the April 25, 2001 edition of Education Week as Making a Silk Purse...

Let us know what you think!

We’re looking for feedback on our new site to make sure we continue to provide you the best experience.


This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Future of Work Webinar
Digital Literacy Strategies to Promote Equity
Our new world has only increased our students’ dependence on technology. This makes digital literacy no longer a “nice to have” but a “need to have.” How do we ensure that every student can navigate
Content provided by Learning.com
Mathematics Online Summit Teaching Math in a Pandemic
Attend this online summit to ask questions about how COVID-19 has affected achievement, instruction, assessment, and engagement in math.
School & District Management Webinar Examining the Evidence: Catching Kids Up at a Distance
As districts, schools, and families navigate a new normal following the abrupt end of in-person schooling this spring, students’ learning opportunities vary enormously across the nation. Access to devices and broadband internet and a secure

EdWeek Top School Jobs

Project Manager
United States
K12 Inc.
High School Permanent Substitute Teacher
Woolwich Township, NJ, US
Kingsway Regional School District
MS STEM Teacher
Woolwich Township, NJ, US
Kingsway Regional School District
Speech Therapist - Long Term Sub
Woolwich Township, NJ, US
Kingsway Regional School District

Read Next

Education Funding How Much Each State Will Get in COVID-19 Education Aid, in Four Charts
This interactive presentation has detailed K-12 funding information about the aid deal signed by President Donald Trump in December 2020.
1 min read
Education Funding Big Picture: How the Latest COVID-19 Aid for Education Breaks Down, in Two Charts
The massive package enacted at year's end provides billions of dollars to K-12 but still falls short of what education officials wanted.
1 min read
Image shows an illustration of money providing relief against coronavirus.
DigitalVision Vectors/iStock/Getty
Education Funding Education Dept. Gets $73.5 Billion in Funding Deal That Ends Ban on Federal Aid for Busing
The fiscal 2021 deal increases K-12 aid for disadvantaged students, special education, and other federal programs.
3 min read
FILE - In this Nov. 8, 2020, file photo, the Washington skyline is seen at dawn with from left the Lincoln Memorial, the Washington Monument, and the U.S. Capitol.
In this Nov. 8, 2020, file photo, the Washington skyline is seen at dawn with from left the Lincoln Memorial, the Washington Monument, and the U.S. Capitol. (File Photo-Associated Press)<br/>
J. Scott Applewhite/AP
Education Funding The Incredible Shrinking COVID-19 Relief Package for Schools?
The parameters of new bipartisan aid bill might signal that coronavirus relief for schools will fall short of what they've hoped for.
3 min read
The U.S. Capitol Dome
The sun shines on the U.S. Capitol dome in Washington. (AP Photo/Patrick Semansky)
Patrick Semansky/AP