He's Got Your Number
These days, the hottest speaker on the education-policy circuit is a soft- spoken 58-year-old professor of statistics who has spent much of his career crunching numbers for agricultural researchers. "I turned down four speaking requests yesterday," says William Sanders, a professor at the University of Tennessee's school of agriculture and, oddly enough, the man who has created a controversial method of judging the effectiveness of teachers and schools. Last year, Sanders flew more than 30,000 miles on Delta Airlines alone, traveling, as he puts it, "from sea to shining sea" to give his standard hour-and-a-half presentation on the benefits of his evaluation system.
|Sanders insists that he is more interested in helping weak teachers improve than getting rid of them.|
"The message is the same wherever I go and whoever I talk to," says the Tennessee native, a courtly man with gray hair who wears wire-frame glasses and speaks with a lilting Southern drawl. "I'm a spokesman for the data." By that, he means that the art of teaching can be quantified by taking students' test scores, plugging the numbers into a computer, and measuring how those students improve from one academic year to the next. Sanders claims his system is "more fair, more realistic, and more reasonable" than any other method of evaluating schools and teachers.
Not everyone agrees, but that hasn't stopped Sanders' system from gaining acceptance. Tennessee has used it since 1992 as part of a statewide school reform package. Florida is taking a close look at it, as are a number of school districts across the country. Some, like District 60 in Pueblo, Colorado, are already using it.
What policymakers find so appealing about the method is that it seems to filter out external factors—in particular, the socioeconomic levels of students—that make traditional school-by-school comparisons so suspect. For years, teachers and principals in poor inner-city schools have argued that because their students come from disadvantaged backgrounds, it is unfair to compare their scores on standardized tests with those of more affluent children. Sanders agrees and has devised a system that he says measures only the "value added" to a child's learning by a school or a teacher. The advantage of such an approach, he believes, is that it focuses on students' academic gains rather than their raw achievement scores. And by doing that, one can draw some conclusions about how schools, and even individual teachers, are performing.
"Society has a right to expect that schools will provide students with the opportunity for academic gain regardless of the level at which the students enter the educational venue," Sanders says. "In other words, all students can and should learn commensurate with their abilities."
The policy implications of Sanders' system are enormous. To supporters, it promises—finally—an objective method of teacher evaluation, one that is based on classroom results, not peer review or credentials or observation. However, the value-added method alarms many teachers—does good teaching always translate to improved test scores?—and they worry that it could be used against them unfairly.
Sanders insists that he is more interested in helping weak teachers improve than getting rid of them. "We hope our research is used for diagnostic purposes," he says, "so teachers can consider what they're doing and how to improve it." In Tennessee, much to Sanders' delight, some teachers are doing just that.
'I've told this story 8 million times," says Sanders, before recounting yet again how he stumbled upon his method of evaluating schools and teachers. He is sitting in his second-floor office at the Statistical and Computing Consulting Services Department, in Morgan Hall, a red-brick Gothic Revival building on the University of Tennessee's Knoxville campus. This morning, Sanders is dressed casually in a blue-and-red plaid shirt and khaki dress slacks, held up by both a belt and suspenders. (You can never be too careful.) His office, with vintage metal chairs and bookcases, wood-paneled walls, and pea-green carpet, recalls another era, say, 1955. Sanders, too, seems out of the past. He is friendly and gracious, with Southern manners and rural charm. In another life, he might have been a railroad clerk or a small-town postmaster. Every now and then, he takes a dip of Skoal chewing tobacco and places it between his cheek and gums.
Sanders grew up on a small dairy farm in middle Tennessee. He attended public schools and, upon graduating from high school, entered the University of Tennessee, where he studied physics and animal science. Statistics, however, became his primary focus. "I was intrigued with how statistical methodology could be applied to so many real-world situations," he says. After earning a doctorate in biostatistics and quantitative genetics, he was hired in 1968 by the Oak Ridge National Laboratory, which was studying the biological effects of radiation. Four years later, however, he was asked to return to UT to create the Statistical and Computing Consulting Services Department, which conducts research for UT scientists from a wide variety of academic disciplines.
Ultimately, Sanders hopes his system will help struggling teachers get support and assistance.
In 1982, Sanders happened to read a newspaper article asserting that teacher effectiveness could not be measured quantitatively by looking at students' test scores. Then-governor Lamar Alexander was proposing a plan that would identify Tennessee's best teachers and allow them to receive higher salaries, greater status, and new roles. "The big issue," Sanders says, "was this: If you're going to do something like that, how are you going to measure the teachers? The article I read cited two or three statistical reasons why you couldn't use student achievement data to do that. And I thought, Well, there may be good reasons for not doing it, but those are not good reasons." Sanders was convinced it could be done, and he decided to try to prove it.
He and a colleague, Robert McLean, sent a letter ("More on a lark than anything else," Sanders says) to the governor's office explaining that they would like to use something called a "mixed model" statistical methodology—an approach originally developed by a renowned animal breeder at Cornell University—to show that student achievement data could in fact be used for teacher assessment. The letter got bounced to the state department of education, which eventually furnished the researchers with data. Using three years' worth of student test scores from Knox County, Tennessee, Sanders and McLean proved their theory. The researchers wrote up their findings, and then Sanders called his contact at the education department in Nashville and said, "I'm through."
"With what?" came the reply.
"It's obvious that they had not taken it nearly as seriously as I had," says the professor, smiling. Two later studies confirmed the Knoxville data, but no one seemed particularly interested in Sanders' research. "I thought the whole world was waiting on this!" he says.
Several years later, however, policymakers found new reason to look at Sanders' numbers. Education reform was in the air, and Tennessee legislators were looking for a fresh approach to school accountability. In 1990, Sanders spent about four months in the state capitol meeting with officials, including the new governor, Ned McWherter. Sanders advocated that his system be used statewide, but when it began to look as if that might actually happen, he was momentarily caught off guard. "I was like the dog that chased the car," he recalls. "It looked like the car was going to stop, and I didn't know what I was going to do with it."
Eventually, legislators incorporated Sanders' methodology into the 1992 Education Improvement Act, signed into law by McWherter. Partly a response to a court order to equalize funding between rural and urban schools, the law established a statewide half-cent sales tax to boost K-12 education spending. But it also created a number of reform measures, including the groundbreaking school-accountability program based on Sanders' work, dubbed the Tennessee Value-Added Assessment System, or TVAAS. Sanders: "I told the state commissioner of education that he was going to have to let me go home and assemble a team of folks to build a software system to allow me to do the very things that I had been advocating." The result was the Value-Added Research and Assessment Center at the University of Tennessee, with Sanders at the helm. (Sanders continues to run the university's Statistical and Computing Consulting Services Department, but most of his time is now devoted to education research.)
After the law passed, teachers in Tennessee were skeptical of the value-added component. Many directed their hostilities toward Sanders, an easy target given his affiliation with UT's school of agriculture. How, they demanded to know, could you use a statistical method developed for evaluating farm animals to measure the effectiveness of teachers? But Sanders stood his ground, and he patiently explained his complicated method to anyone who took the time to call or write.
Here's how it works: Each year, in late March or early April, students in grades 3 through 8 take a battery of tests known as the Tennessee Comprehensive Assessment Program, or TCAP, in five subjects: reading, language, math, science, and social studies. Scores are sent to Sanders and his staff, who plug the numbers into an IBM RS/6000 series computer, which merges new test data with scores from previous years. This enables the statisticians to track student achievement over time.
|Sanders advocated that his system be used statewide, but when it began to look as if that might actually happen, he was momentarily caught off guard.|
"We follow the progress of each child individually," Sanders says, "and compare each child to his own past performance, not to test scores of other kids." Of course, the growth rate of one student doesn't say much about his or her teacher. But when you look at the data by classroom, by school, and by district, patterns begin to emerge. "And if you find that the majority of kids in a particular classroom have flat spots on the growth curve," Sanders explains, "it becomes strong, powerful evidence that something regarding instruction is not happening in that classroom."
In the fall, schools receive their report cards for the previous academic year. The summaries, which are made public and printed in Tennessee newspapers, show how each grade has improved—or not—in each subject, based on national norms. In other words, it reveals how much the children have learned—or how much "value" has been added—over the course of a year. For example, a report card might show that an elementary school's 4th grade reading scores have gone from 85 percent of the national norm to 110 percent, a gain of 25 percent. Or, it might show that scores have flattened or even dropped.
In Tennessee, the scores have consequences. Once run through Sanders' value-added system, they are combined with other indicators to determine rewards for individual schools and sanctions for districts. Schools whose cumulative gains in the value-added scores from each of the five subjects at least match the gains in the national norm are eligible for additional state funds. Meanwhile, districts that fail to achieve average gains equal to at least 95 percent of the national norm are subject to sanctions.
Each teacher also receives a report card, which reveals the academic progress of his or her students. These reports are not made public, and their contents are known only to the teacher and his or her principal. (Of course, in schools where, say, there is only one 4th grade teacher, that teacher's scores are effectively made public in the school report.)
Sanders has always said that scores for individual teachers should not be released publicly. "That would be totally inappropriate," he says. "This is about trying to improve our schools, not embarrassing teachers. If their scores were made available, it would create chaos because most parents would be trying to get their kids into the same classroom."
Still, Sanders says, it's critical that ineffective teachers be identified. "The evidence is overwhelming," he says, "that if any child catches two very weak teachers in a row, unless there is a major intervention, that kid never recovers from it. And that's something that as a society we can't ignore."
Ultimately, Sanders hopes his system will help struggling teachers get support and assistance. But he adds, "After a reasonable period of time, either if they don't try to improve or if they don't improve, then they should be encouraged to seek employment elsewhere."
No teacher in Tennessee has lost his or her job due to poor TVAAS scores, according to Al Mance, executive director of the Tennessee Education Association. The union fought hard to keep the scores confidential, and so far, they are only considered as one of many factors in personnel evaluations. Test results are "only a small piece of the pie," says Mance.
But that's not to say that others aren't eager to use the scores for their own purposes. A few years ago, Dave Shearon, a school board member in Nashville, proposed that the city's school system use the TVAAS data to plot on a map where the most effective and least effective teachers work, using a system of red and green dots—red for the best teachers, green for the worst.
Shearon told a reporter for the Tennessean, "I am concerned about whether or not our distribution of most and least effective teachers is slanted one way or another." He suggested using the information to reshuffle teaching assignments on a voluntary basis. But he backed down after Nashville teachers began wearing red and green stickers to school, mocking the idea.
Marsha Denton wasn't very happy the first time she saw her value-added scores. A social studies teacher at Buena Vista Middle School in Nashville, Denton had always considered herself a first-rate teacher.
"When I looked at my data," she says, "I saw tremendous strength in 7th grade, but my data for 8th grade wasn't as good. It was OK, but not nearly as strong." The numbers, she confesses, "messed with my head."
"I bawled and screamed for a couple of days," she says. "But then I realized it was just like anything else. You can sit around and whine and cry about it all day, or you can say, 'What am I going to do about it?' " Denton decided she would try to use the data to improve her teaching. First, however, she set about learning as much as she could about the value-added approach. And the more she learned, the more convinced she became that it was a useful—and valid—tool for measuring the strengths and weaknesses of teachers.
‘It seemed to me to be the first reasonable method of evaluating students because it wasn't biased about socioeconomic status, And it seemed like something that I could use.’
"It just made so much sense," says the 40-year-old teacher, who has long been suspicious of the efficacy of standardized-test scores. "It seemed to me to be the first reasonable method of evaluating students because it wasn't biased about socioeconomic status. And it seemed like something that I could use."
Denton looked at her scores and concluded that she was using different teaching styles for her two grades. "In my 7th grade classes," she says, "I was getting the students more involved in the learning process. I was more of a facilitator. But in my 8th grade classes, I was the one busting my chops. The kids were sitting there listening and taking notes, and we were interacting and talking. But they weren't doing the thinking—I was the one doing the thinking." So the following year,
Denton made some changes in her classroom, and her scores went up. "I had improved in the areas that I had hoped I would."
Word of Denton's expertise with the TVAAS scores spread to other teachers, and it wasn't long before she was in demand throughout the district. Eventually, she was getting so many phone calls that she decided to take a two-year leave of absence from Buena Vista. Now, she holds two jobs: one as a consultant for the Metropolitan Nashville Public Schools and the other as an associate professor of education at Trevecca Nazarene University. She spends much of her time on the road, meeting with teachers, showing them how to use their value-added scores to improve their teaching. Often, she admits, it's a hard sell. "Most teachers are skeptical of value-added because they don't understand it," she says. "I'm not there to change their minds or to tell them what to do. I'm there to educate."
Franklin, Tennessee, is a small town about 15 miles south of Nashville. Last summer, Denton spent three days meeting with a group of teachers who work in the Franklin Special School District. Among them was Jane Brown, a 4th grade teacher at Moore Elementary School.
"I wasn't buying the TVAAS data," Brown says. "I couldn't make it work. But Marsha told us, 'It really doesn't matter if you buy it or not. It's not going to go away. Deal with it.' And she was the first person who said, 'I can give you a way to use this information to your advantage.' "
By traditional measures, Moore Elementary, which serves 500 K-4 students, seems like it's doing a good job. The school consistently scores above both state and national norms in all subject areas. However, the school's value-added scores are consistently below expected growth targets. "And that's been very frustrating," Brown says.
"We used to just look at our test scores," says principal Patricia Green, "and the majority of the students were doing very well. We thought, We must be doing something right. But when we looked at the value-added scores, some of the students weren't doing as well as they should, in terms of growth. And these were some of the better kids. Sure, they were scoring in the 99th percentile, but they weren't gaining as much as they should from one year to the next."
With Denton's help, Brown and her colleagues concluded that they were devoting more time and energy to the school's neediest children—"which is the teacher's first impulse," Brown notes—but not spending as much time working with the top students. This was causing what Sanders calls a "shed" pattern, in which academic gains drop off as achievement level rises, creating a downward slope that resembles a slanting roof. (Indeed, Sanders' research has shown that, in Tennessee, high-achieving students—especially high-achieving minority students—make the least academic progress from year to year.)
|When you look at the data by classroom, by school, and by district, patterns begin to emerge.|
Moore's teachers are now trying to figure out what to change to reach all students effectively. "But you don't want to throw out the baby with the bath water," Brown says. "You can only change a certain number of variables at a time and then wait and see what happens on the test."
Still, Brown concedes that the TVAAS data is useful. "Now," she says, "instead of change being based on gut feelings, it can be based on quantitative information about the students. I'm looking for every clue I can to find out how I'm doing as a teacher." And while she praises her district for "not getting too out of whack" about the test results, Brown notes that some teachers "stress out" when her school's TVAAS scores are made public every year. "It can be demoralizing," she says.
"There are some teachers here who think this will go away," she adds, "but I don't think it will. And if I had the power to do away with it, I probably wouldn't."
Sanders admits that some teachers "absolutely refuse to look at the data." On that point, Paul Webb, a teacher and longtime critic of Sanders, agrees. "The dirty little secret," he says, "is that most teachers don't pay much attention to it."
Paul and Judy Webb both teach school in and around Newport, Tennessee, about an hour from Knoxville. When it comes to Sanders and his value-added approach, they don't mince words, calling it "an unreliable, invalid, and often slanderous evaluation system." Several years ago, they set up a Web site to disseminate their views. On it can be found just about every charge that has been leveled against Sanders and the TVAAS.
For one thing, they assert, it's too complicated. (Others have made the same charge.) Teachers must take it on faith that Sanders' computer methodology is fair and accurate. "That may be OK for religion," Paul Webb says, "but not when you're talking about education." The Webbs take particular offense at a handbook published by Sanders' office titled, "Using and Interpreting Tennessee's Value-Added Assessment System." The handbook concedes that while mixed-model statistics can be learned, "the subject is complex, and without spending some time and energy on it, one will probably have to go with faith. Incidentally, there is nothing wrong with faith, and it is certainly preferable to ignorance and prejudice."
The Webbs call such statements "arrogant." In short, they argue that the art of teaching cannot be quantified. "Principals," they say, "need only to know great teaching when they see it to ensure quality for Tennessee's children."
To Monty Neill, executive director of the National Center for Fair and Open Testing, or FairTest, Tennessee's value-added system is but one more example of the nation's current obsession with testing and assessment. He admits that "the concept behind Sanders' method is reasonable—that is, kids begin school in different places and grow at different rates. But the problem is in using standardized tests to determine that." And, he adds, "testing all students with a norm-referenced, multiple-choice test that will substantially control curriculum and instruction is too high an educational price to pay for the information gained."
Neill also argues that Sanders' approach "falsely assumes that kids only learn academics at school." What about learning that takes place at home? he asks.
This criticism echoes a 1995 review of Sanders' system by the state comptroller's office. The review cites several failings of the TVAAS, including large year-to-year swings in value-added scores that administrators can't explain. It also faults the system for its assumption that all learning takes place in the classroom. "The model seems to assume that all gain (or lack thereof) is purely teacher-related," the report reads, "while it has not provided adequate evidence to support this contention."
Sanders has spent a lot of time answering the critics. Yes, he agrees, his system is complex, and he doesn't expect teachers to understand the mathematical analysis behind it. "I can use a cell phone without knowing how it works," he says. "All I want to know is that I'm talking to someone on the other end of the line."
As for the concerns about statistical variations, Sanders argues that they are to be expected in the early stages of a new model, and that they will decrease over time.
Regarding the fundamental question of whether teaching, in all its messy and complicated glory, can be quantified, Sanders has this to say: "There is no way you can measure all of the important things a teacher does in the classroom. But that doesn't mean you shouldn't be measuring the things that can be measured."
Sanders likes to say, "I'm the numbers guy, the measurement guy. I'm not the policy guy." But that's misleading. In fact, Sanders has become something of a guru, and wherever he goes, he uses his data to draw some very specific conclusions about education, teachers, and exactly what schools should be doing to make sure that all students are achieving at higher levels. He isn't shy about voicing his opinions on such matters, in person and in print.
Sanders, for example, says that teacher effectiveness is "the single biggest factor influencing gains in achievement, an influence many times greater than poverty or per-pupil expenditures"—a statement that challenges long-held assumption about the influence of a child's socioeconomic background on his or her learning.
When it comes to Sanders and his value-added approach, they don't mince words, calling it "an unreliable, invalid, and often slanderous evaulation system."
And Sanders has lots to say about the classroom and what makes for good instruction. Effective teachers, he says, "get excellent gains across the entire spectrum of kids in their classroom [because] they've got kids working at different paces and at different places." Ineffective teachers, on the other hand, "tend to focus on the lower-end kids. They may be sincere and conscientious, but they're holding back the others."
Despite his claims that he is an agnostic on policy questions, Sanders argues, "It is imperative that we focus on bringing all our energy and effort to try to shrink the variability of teacher effectiveness, so that it doesn't make so much difference which classroom a child walks into."
In a paper titled "Cumulative and Residual Effects of Teachers on Future Student Academic Achievement," written with his wife, June Rivers—a former teacher who holds a Ph.D. in K-12 administration—Sanders argues that "teacher assignment sequences should be determined to [ensure] that no child is assigned to a teacher sequence that will be unduly hurtful to his or her academic achievement." Translation: Assignments should be reshuffled so that no student has an ineffective instructor more than once.
Sanders also criticizes the standards movement embraced by the nation's policymakers for setting "unrealistic goals" that not all students can attain. "I have a problem with statements like, 'What should 4th graders know and be able to do?' " he says. "Pray tell, which 4th graders are we talking about?"
He adds, "I believe we should visualize the curriculum not as stair steps, but rather as a ramp. I want all kids to go up the ramp, but I recognize that not all kids are going to be at the same place at the same time. What I want is to hold educators accountable for is the speed of movement up the ramp, not the position on the ramp."
But ultimately should all students get to the top of that ramp?
"I don't think that's realistic either," he says. "What I want to do is to push all kids as far up that ramp as possible, and if we focus on the gain rate, then achievement levels for all kids are going to be higher than any of us can imagine."
Maybe a statistician has no business making these kinds of policy recommendations. If Sanders is indeed "the measurement guy" and not "the policy guy," why does he use his data to tell educators and policymakers what they should be doing?
Still, it's hard to disagree with much of what he says. Teachers do matter a great deal—everybody knows that—and the best ones probably are the ones who make every effort to reach students at all levels. But determining exactly who are the best and who are the worst—can such judgments really be made based on the results of a single, annual battery of standardized tests?
Though Sanders denies it, he seems to be on a mission to convince anybody who'll listen that his system is by far the best way to measure what goes on in the classroom. "No," he counters, "it's more that I have a responsibility to explain and defend what I advocate." The nonstop travel, he admits, is "physically draining." Indeed, he hopes to cut back on his speaking engagements soon. This summer, he and five members of his value-added research team, including his wife, will leave the University of Tennessee to join SAS Institute, a research firm based near Raleigh, North Carolina. Sanders will be head of the firm's new Educational Value Added Assessment Services division. (He and his staff will continue to crunch the TVAAS numbers for the state of Tennessee.)
But keeping Sanders off the road may be difficult. Preaching seems to be in his blood.
Not long ago, Sanders spoke to a group of teachers in a large urban district in the Northeast. (He won't say where.) "And there's a very strong union in this district," he says. "When I started to speak about using test data as part of teacher evaluation—here I am, a small, white Southerner with a strong Southern accent, in an area where folks are not accustomed to hearing people with Southern accents—there was this din in the audience, lots of people having conversations. Attendance was required by the superintendent, and it was clear that these teachers had just as soon not be there. Needless to say, it was not the warmest reception that I've had in my life. As I started speaking, I thought to myself, Sanders, what are you doing here?
"But I think it's fair to say that, for the last 15 minutes, you could hear a pin drop. One of the union officials came up to me afterwards and said, 'You didn't say a thing that I really disagreed with.' "
Sanders smiles, savoring the victory.
Vol. 11, Issue 8, Pages 42-47Published in Print: May 1, 2000, as He's Got Your Number