When Testing Fails
A recent spate of articles and education blogs brings encouraging news of research showing that standardized test scores are an unreliable way to measure the effectiveness of a single teacher. It’s wonderful to hear this from people who are not teachers, but of course, it’s not really news: those of us in the classroom, the real teaching experts, have known this all along. Consider this article a friendly “we-told-you-so.”
I’m not suggesting that test scores are without value, or that teachers can’t be assessed and expected to account for their practices. However, it is vital that everyone who cares about teacher quality understand how many variables, apart from the single classroom, affect measurements of student performance.
Test data are appealing because they go into number-crunching spreadsheets so neatly. However, if we’re going to take a scientific approach to instructional evaluation, how can we tolerate definitive conclusions based on the results of an experiment with dozens of unknown, uncontrolled, significant variables?
Want to see what I mean? Please enter my hypothetical high school laboratory for a quick, six-year study.
The Rise of the “Effective” Teacher
It is Year One. I teach 10th graders in my English class for fifty minutes a day. For another fifty minutes a day, half of my students are in social studies classes run by exceptional teachers recently trained in teaching literacy skills using history textbooks. My other students spend their time with history teachers who have not yet received specialized training to support reading skills. Now, when their reading scores come back, which of my students are more likely to show greater gains in their reading skills? How do we factor in all of the other text-based instruction that students receive? When my students have received varying forms of relevant instruction outside of my class—for far more hours than they spend in my class—what do reading test results say about my effectiveness? Our experiment begins with thoroughly compromised data if the goal is to evaluate my work.
The next year begins with positive developments: The county’s new adolescent health care initiative has helped a number of families improve the management of students’ asthma and diabetes, resulting in them spending more time in the classroom. By now, all of the history teachers have undergone the reading instruction training. A new push towards college preparation for all students has resulted in more students in foreign language classes, meaning additional study of transferable grammatical concepts. Our librarian has secured a grant that results in an infusion of recent fiction books, and a quarter of my students start reading more in their free time. Has this combination of events made me a better teacher? Apparently, yes—my students’ test scores just went up! I am magical. I amaze.
The news keeps improving. One year later, the science teachers, too, are now tackling reading instruction. The physical education department has upgraded facilities, so my students arriving from P.E. now arrive on time more regularly, feeling clean and comfortable. Some parents have stepped forward to donate more classroom books, magazines, and newspapers for my students to read. Half of my students take their standardized tests in a classroom where the teacher has accidentally left useful educational posters uncovered on the walls. Look at those scores! I am brilliant. I levitate.
The Fall of the “Effective” Teacher
Because of my obvious effectiveness, in Year Four I am asked to take on a new role, and I start teaching 9th graders instead. These 9th graders have struggled to match the test results that are clearly a product of my pedagogical prowess. However, the history and science teachers at this grade level have had less training in reading instruction, so my students have less reinforcement. Budget cuts have reduced librarian hours, and library usage has declined proportionally. And this year, every teacher follows the proper testing procedure, covering up informative posters and thereby correcting the undetected mistake that inflated my students’ results last year. Unsurprisingly, scores are down this year. So, what has happened to my teaching skills? What did I do that made me less effective?
The next year starts off poorly, as one of my family members suffers a serious illness requiring hospitalization, and I miss two weeks of school. A breach in computer network security prompts the district to restrict access to such a degree that teachers and students cannot access valuable online resources we relied on last year. Students circulate a petition against excessive springtime testing; they know that the state test scores have no implications for their school grades or college applications, and word of a boycott has dampened the campus atmosphere. Another disappointing set of test scores arrives. I’m pathetic. Why can't I teach anymore?
And finally, Year Six begins with new district boundaries resulting in a third of my students now coming from different middle schools, necessitating changes in my curriculum. An administrator in our school takes a leave-of-absence, causing some disarray in various school procedures. The state has altered this year’s testing calendar so that the tests come five weeks earlier. With the economy in decline, the number of parents paying for extra tutoring outside of school has shown a significant drop. Alas, look at those scores—down for the third year in a row! I'm dreadful. Why did anyone ever think otherwise?
The Lab Report
You may now exit my laboratory, and we’ll review the experiment. All of my examples addressed entirely realistic variations—these conditions exist and they change, regularly, in every school and district across the country. I noted only a handful of variables for each year, when in fact, there are hundreds. And most importantly, not one of those variables had anything at all to do with my knowledge, my skills, or my instructional practices, yet some pundits, policymakers, and even administrators cling to the belief that data from a single test reveal what we need to know about my teaching.
I’m well aware that there are studies showing teacher quality is the most important predictor of student success. Those studies are helpful and they matter because they compile data on an appropriately broad scale so that the impact of so many variables can be minimized.
My point is that conclusions drawn from vast amounts of testing data do not automatically justify using the same tools to measure effects when the scale is reduced to a single teacher—precisely what some recent writings have offered as a piece of news, when for teachers, it was self-evident all along.
Again, I'm not suggesting we shouldn't test, or that we shouldn't examine the results, especially for an entire school or a school system. But I hope certain education experts out there, especially those who don’t speak with real understanding about schools despite their expertise in school research and policy, will recognize the futility of relying on any single measure, particularly to evaluate individual teachers. It’s pointless to continue looking for a straight line connecting individual teacher quality and student performance on one test; we need to recognize that the line is never straight, and it usually isn't even a line.