Is the Race to the Top a Race to Test?

By Anthony Cody — August 10, 2009 4 min read

According to an article in the Los Angeles Times, California’s state senate is scheduling hearings to consider the abandonment of a policy that blocks the use of student test score data for the evaluation or compensation of teachers, in order to qualify for federal “Race to the Top” funding. California recently created a data system to manage all the standardized test data. The Education Code was revised to state:

(c) Data in the system shall not be used, either solely or in conjunction with data from the California Longitudinal Pupil Achievement Data System, for purposes of pay, promotion, sanction, or personnel evaluation of an individual teacher or groups of teachers, or of any other employment decisions related to individual teachers. The system shall not include the names, social security numbers, home addresses, telephone numbers, or e-mail addresses of individual teachers. (California Education Code Section 10601.5, section c)

This clause has caught the attention of Secretary of Education Arne Duncan, who says it makes the state of California ineligible for his $4 billion “Race to the Top” fund. He intends to direct this fund towards what he considers innovative practices, including paying teachers more if they increase their students’ test scores.

Unfortunately, this approach dovetails neatly with the past seven years of federal pressure on schools to use test scores as the primary means of measuring learning.

On the face of it, it sounds reasonable to evaluate a teacher based on how well his students have learned. When I applied for National Board certification, I provided videotapes of my students engaged in discussions, samples of their work, and further evidence of all they had learned from my instruction. I deeply believe teachers should be responsible for how well their students learn. But the devil is in the details of how we measure learning.

How is Student Growth Measured?

The primary means of measuring student growth goes under the name Value Added Model (VAM). Using this method, some school districts have begun to analyze a teacher’s performance by examining her students’ growth during the time she taught them. In some ways, this seems more reasonable than the current NCLB practice of comparing this years’ students with last years’, and expecting constant growth. However, a fascinating study was released in May that sheds some disturbing light on the flaws in this approach.

Princeton scholar Jesse Rothstein points out several key problems. To quote from the study summary:

Rothstein’s study focuses on the challenge of distinguishing a teacher’s contribution from pre-existing differences among students. Teachers do disparate jobs – some teach “gifted and talented” classes, some focus on students with limited English skills, and some work with students with special needs. If accountability and merit pay policies are to produce improvements in teacher quality, it is essential to ensure that teachers who get the “right” students who test well do not get unfair advantages, and that teachers who get the “wrong” students do not get unfair disadvantages. It will do no good, and may even cause harm, to implement a merit pay system that rewards teachers for working with gifted students and penalizes those who work with more challenging students.

The crux of any reform in the pay system is that it be fair.
If teachers working with the most challenging students face even more pressure to raise test scores, and are punished unfairly when their students do not perform for a variety of reasons beyond the teachers’ control, that will drive down morale and boost turnover in these schools.

Rothstein came up with a brilliant means of proving just how unfair this system is. If students are not assigned randomly to classes, then a teacher might be rewarded or penalized based on who was assigned to their class – and the pre-existing condition of these students.

To show this, Rothstein develops falsification tests for the VAMs. (Falsification testing evaluates an assertion by asking whether it has implications that are known to be incorrect.) Specifically, he asks whether the VAMs imply that 5th grade teachers (for example) have effects on students’ 3rd and 4th grade test scores. This test exploits the fact that future teachers cannot have causal effects on past outcomes, so a method that successfully distinguishes causal effects from pre-existing differences among students should not find signs of such effects.
In fact, the VAMs currently used for teacher accountability indicate that 5th grade teachers have large effects on students’ 3rd and 4th grade achievement. This reflects systematic sorting of students into classrooms on the basis of past achievement, producing substantial dispersion of students’ 4th grade scores and score gains - the growth in scores between 3rd and 4th grade – across 5th grade classrooms. Sorting on past reading gains is particularly prominent, though there is clear evidence of sorting on math gains as well.

Just to explain this further, imagine three fifth grade teachers; Ms. Best, Ms. Good, and Ms. Worst. If I analyze the scores of the fourth grade students heading into these three classrooms, and find that in fact most of the highest scoring students wind up in Ms. Best’s class, then I have uncovered a non-random, and therefore unfair distribution. This is exactly what Rothstein found.

I experienced this firsthand a few years ago, when one of my sixth grade math classes included six students who had been held back at least once, some of them twice. These students were much harder to move academically than others, and their concentration in that class made the group as a whole tougher to teach. A study released last year found that students in homes with domestic violence not only suffered academically, but also brought down the scores of their peers in the classes they shared.

This article in the San Francisco Chronicle reveals that as many as 40% of the children in some urban neighborhoods suffer from post-traumatic stress disorder as a result of the violence they have witnessed at home and in their neighborhoods. This has profound effects on how these students learn. These problems are not distributed randomly in society, and these students are not distributed randomly within a school.

These are not excuses, but very real challenges faced by urban educators every day. It does not mean we give up on these students, but it does mean we have to be careful to craft solutions that support rather than further stigmatize these schools.

Rothstein’s research also showed that teachers who were successful in raising student scores in the short term, through test preparation for example, did not necessarily have a lasting effect on their performance. This is hugely important. Our schools already suffer from an overemphasis on test preparation. If we actually tie teacher evaluations and pay to these scores, we are likely to deepen this emphasis, and our students will suffer in the long run.

In demanding a link between test scores and teacher pay and compensation, Duncan has chosen a rather poor vehicle for innovation. There is no question that teacher evaluation should be strengthened, and there is plenty of room for innovative approaches to compensation as well. But there is nothing innovative about more emphasis on test scores.

Way back in November of 2007, candidate Obama said,

And by the way - don't tell us that the only way to teach a child is to spend most of the year preparing him to fill in a few bubbles on a standardized test. Don't tell us that these tests have to come at the expense of music, or art, or phys. ed., or science. These tests shouldn't come at the expense of a well-rounded education - they should help complete that well-rounded education. The teachers I've met didn't devote their lives to testing, they devoted them to teaching, and teaching our children is what they should be allowed to do.

He was right about that.

Now Obama says that tying test scores to pay and evaluations will not result in teachers teaching to the test. I do not comprehend how this can be so.

He and Secretary Duncan seem to have very little imagination when it comes to finding out which teachers are doing a good job. They continually return to test scores as the essential measuring stick for teacher quality. Teachers know that this shortcut will lead to a dead end.

California State Senator Gloria Romero has called for hearings to eliminate the provision separating state testing data from evaluations and pay. Here is a link to California Senators/Assembly if you would like to share your views with them.

What do you think? Is it time to evaluate and compensate teachers using student test scores?

