|Is high-stakes testing like a race without a starting line?|
Imagine the season’s final high school cross-country meet, run by the state Office of Racing. Officials pledge to demonstrate how each runner is doing, and which runners, teams, and coaches are the best. Judges crowd the finish line with cameras, computers, stethoscopes, and various diagnostic tools. As runners cross the line, not only their times, but also their pulses, breathing rates, perspiration levels, muscle oxidation, and a host of variables are checked and computed. This is the new world of racing, officials say. We will know how everyone is doing, who is the best, and why.
Now suppose, despite all this measuring, that the race has no official starting line. Some runners run 10 miles, others five, still others two. Some coaches and parents complain that the race is not fair, but the Office of Racing is not deterred. This is an absolute measure, officials say, not a relative one. When runners cross the finish line is what matters, not where they started. Sure, runners who start farther back have farther to go, but hey, that’s life. After all, this race is only one tool of many for assessing running ability. Yes, it will determine future access to athletics for many runners. And yes, average team results may eventually influence the coaches’ jobs. And, of course, teams may be reconstituted if this year’s team performs worse than last year’s. But despite these important-sounding consequences, this race is just one indicator of athletic performance. Now, everyone stop whining and try harder.
The idea of a race with no starting line is absurd, of course. In racing, we understand that the distance traveled is critically important in determining the outcome. In fact, the key factor is each runner’s speed, or rate of progress. To know this, we need to know when and where each runner started. Only then, we understand, can we compare them fairly.
It is also true, in racing, that while we track team scores, we know that they are an aggregate of individual performance. If we compare last year’s team to this year’s, we recognize that the different outcomes—whether better or worse—are significantly affected by having different members on the team. We understand that different runners naturally get different results, even with the same coach. Do we think that effort, practice, and coaching quality make a difference? Of course. Do we think these factors make up for having different runners and different starting lines? Of course not. The race described above may provide information on runners’ physical conditions, but it does not tell which runner is fastest or has the greatest endurance because it doesn’t measure how far they have run. For the same reason, it doesn’t show which team is best or which coach is most effective.
Those who follow education will recognize this analogy as representing many of the new high-stakes tests being introduced into public schools. The analogy is not perfect, I admit, but it illustrates how many tests work. I will use the Massachusetts Comprehensive Assessment System tests for my example, but many state testing programs have similar flaws.
MCAS is given in the 4th, 8th, and 10th grades, with additional tests on different topics planned for other grades. MCAS is a long test with open-ended questions—up to 20 hours over several days—so it provides a lot of information on students, like the race above. While some states use normed national tests that measure students against other students, MCAS is supposed to measure students against the state’s new learning standards.
In theory, educators often prefer such tests, because they provide information on where students are doing well and where they need help. For this reason, districts across the country have developed complex descriptions of achievement in areas such as writing and math, called rubrics. A child’s work is regularly judged according to these rubrics, which provide detailed examples of student work at each level and for each grade. Such rubrics, which apparently inspired MCAS, allow teachers to chart student progress from month to month and year to year, and are an excellent guide to each student’s strengths, weaknesses, and needs.
So far, so good.
MCAS provides no baseline data for each student, so it cannot measure the student’s rate of progress.
But MCAS is not and cannot be used regularly with the same students. It provides no baseline data for each student (the equivalent of a starting line), so it cannot measure the student’s rate of progress. Rather, the baseline data that state officials cite are last year’s 4th or 8th grade scores compared to this year’s 4th or 8th graders—the same schools, but different students.
Thus, MCAS bases its judgment of improvement or decline on the scores of classes composed of entirely different members. It doesn’t know where the children in a given class, or the class as a whole, started the year. Consequently, it may indicate that 4th grade reading in a school has declined, even if this year’s 4th graders have improved over where they were last year as 3rd graders. What the state knows is that this year’s 4th graders scored lower than last year’s. But that isn’t what gets reported. The state says, and most people believe, that school performance has declined.
This problem is magnified when different schools are compared. Because MCAS doesn’t tell us how far students have come or how fast they have progressed, we can’t make any judgment on the quality of a school or the capacity of a student. We can’t tell whether students at one school started behind their peers at other schools or are being taught less. A student who has made enormous strides may still score poorly if he started behind his peers. Similarly, a teacher who consistently teaches two years of material in a year may be judged less competent than one who teaches only a half-year of material, simply because the second teacher’s students started at a higher academic level (closer to the finish line).
Because MCAS describes a student’s status, it may be useful to teachers. But because it does not measure the student’s progress, it is a poor indicator of school or teacher effectiveness, or of student capacity. To assess these, we need to calculate students’ rates of progress based on their starting and ending points. It doesn’t matter where a runner places in a race if we don’t know how far he has run. Also, because schools are not teams, we should be more concerned with the progress of individual students than with the class average.
|The MCAS and similar tests are inappropriate for the high-stakes purposes to which they are increasingly being put.|
For these reasons, the Massachusetts comprehensive assessment and similar tests are inappropriate for the high-stakes purposes to which they are increasingly being put. MCAS is used to rate school and class absolute performance against a standard, without knowing where the children started the year. Does that sound like the cross-country race?
The state may decide to impose sanctions on schools and teachers based on these results in the near future. Even more perniciously, it will soon deny graduation to individual kids based on their test scores, still without understanding their individual progress or effort. MCAS may not be a bad test, any more than the medical tests used in the race above are bad. Rather, it is the inappropriate use of the tests, in both instances, that causes the problem. This inappropriate use makes the race analogy relevant.
This raises the question of standards for testing. What ethical standards should obligate states when they create high- stakes tests for students? Many states are or will soon deny graduation to students who fail a test. Others plan to deny promotion to students as young as 4th grade based on a single test score. These tests are called “high stakes” for a reason—they make a substantive difference in students’ lives. Are we not obligated, when charting a course that may alter a child’s future, to make sure we are doing the right thing? Are we not obligated to make sure our tests are fair to all, that they measure what they say they measure, and that we actually know how far children have progressed from their own individual starting points before claiming to know their ability?
Think back to the race described earlier. Isn’t tying graduation or promotion to a state test like saying that only students who arrive at the finish line within a certain time period will be considered winners? If we don’t know how far the runners have run, we still don’t know how fast they are. Some strong runners will not arrive in time, just as some promising and hardworking students may be flunked or denied a diploma based on largely arbitrary and inequitable criteria.
If MCAS scores rise because struggling students drop out, is that progress? If students, teachers, and schools are punished for slower finish times when they had farther to run, will that improve schools? It’s easy to see why parents of runners who run the farthest object to ranking runners according to who crossed the finish line first.
Despite the rhetoric, MCAS and many other state tests take this same approach, tying high stakes for students to this uneven race. In Texas, one result has been a significant decrease in the number of poor and minority graduates. Do we know enough about these students (or their schools) to say that they are failures, or is it simply that they had farther to run?
There should be greater accountability in education, and it can be structured fairly. But accountability measures ought to assess the improvement of individual students based on their individual progress. If we continue to conduct tests as though they were races with different starting lines, people will be justified in thinking not only that the race needs fixing, but also that it has been “fixed.” Right now, we appear to be using education not as the great equalizer, but as the great divider—the institution that prevents those who start farthest behind from ever catching up.
Educational accountability does not have to be this way. But unless officials stop defending the indefensible and start working with their critics toward common goals, students will continue to suffer unfair consequences.
Donald B. Gratz is a senior associate and the coordinator of national school reform for the Community Training & Assistance Center in Boston. He also serves as a member of the Needham, Mass., board of education.
A version of this article appeared in the June 07, 2000 edition of Education Week as Fixing the Race