Response: The Role of Student Test Scores in Teacher Evaluations (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Larry Ferlazzo

Opinion Contributor, Education Week

Larry Ferlazzo is an English and social studies teacher at Luther Burbank High School in Sacramento, Calif.

The new question-of-the-week is:

What role, if any, should student test scores have in teacher evaluations?

Fortunately - from my perspective, at least - many states seem to be stepping back from including test scores in teacher evaluations.

But others still maintain that requirement.

Today’s guests will share if they believe that choice is a wise one.

David Berliner, Kathleen Neagle Sokolowski, Douglas Reeves, Timothy Hilton, Amanda Koonlaba, and Erin Scholes will be sharing their thoughts. I’ve also included comments from readers. Though this column doesn’t have an accompanying podcast, you can still listen to past ones here.

You might also be interested in The Best Resources For Learning About Effective Student & Teacher Assessments.

Response From David C. Berliner

David C. Berliner is Regents’ Professor Emeritus at Arizona State University, He has written, coauthored or edited over 200+ books, articles, papers, and chapters, among them The Manufactured Crisis: Myths, Fraud, And The Attack On America’s Public Schools and Collateral Damage: How High-stakes Testing Corrupts American Schools:

I have never said I wouldn’t use test scores to evaluate teachers: I said I’d never use standardized test scores to evaluate teachers. Teachers may only account for 10 percent or so of the variance in standardized achievement test scores. So, if teachers don’t affect these tests very much--and they do not--why would we use them as indicators of teacher competence?

But there are tests I would use to evaluate teachers: These are the tests teachers give in class. And I would also use the answers students give to those tests in evaluating teachers. These are quite appropriate artifacts from which to judge what that teacher is trying to accomplish, and how well students are mastering the curriculum appropriate to that grade or course. The tests a teacher gives, and the answers given by their students, are easily evaluated by anyone who knows the curriclum for, say, the 4th grade or for high school algebra. Examination of teachers’ tests (not Pearson’s tests), and students’ answers, provides considerabele insight into teacher’s curriculum competency and student’s proficiency with that curricula.

To go along with the collection of clasroom artifacts, on different occassions, by different observers, would be a number of observations of classroom instruction. Reasonable reliability for the judgements that need to be made about teacher competence can be obtained if we have multiple obsrvations of instruction by multiple observers. These observations and artifact collections are direct measures of what teachers do. Such proximal, direct measures and artifacts of teachers and teaching are much more likely to be valid than indirect and distal measure of teachers and teaching, such as a standarized acheievement test. Standardized acheievement tests are recognized by the American Statistical Associaltion, and most other educational associations and researchers, as remarkably insensitive to classroom instruction. That cannot be said about students answers to teachers’ tests.

Who would make judgements about teacher competence from these data? The school principal is one such person. Of course, if a principal or other adminstrator of an elementary or high school school doesn’t know what 4th graders or algebra students should know and be able to do, they should not be the observer/evaluator--and maybe they shouldn’t be an adminstrator at that school! But the school principal is certainly an evaluator with a stake in the trustworthiness of these kinds of data. Since we know that more than one evaluator and more than one observation is needed, I’d also use Nationally Board Certified Teachers (NBCTs) as observers/evaluators. They too have a stake in the trustworthiness of the data for judging their colleagues. They have demonstrated their committment to a high degree of professionalism

Would a number of classroom visits a year to make these kinds of judgements be expensive, compared to using students’ standardized achievement test scores as a measure of teacher competency? You bet! But lets ask this question differenty: When making judgements about teacher competency, is validity important? You bet! The case for validity is much more likely if the observations and judgements of teachers are made by professionals with the greatest concern for the competency of those being judged--principals and NBCTs.

In my view, I think America faces a simple choice: Live with a cheap, quantitative, and proven invalid measure of teacher competency; or design an expensive, qualitative and (likely) more valid measure of teacher competency. I’d vote for the latter, although politicans all seem to vote for the former. By doing so, they harm our profession.

Response From Kathleen Neagle Sokolowski

Kathleen Neagle Sokolowski is a 3rd grade teacher in Farmingdale, NY. She previously taught 6th grade and kindergarten. Kathleen is one of the co-authors of the Two Writing Teachers and the co-director of the Long Island Writing Project. She blogs at Courage Doesn’t Always Roar:

Teaching is a complex science and an art. As a public school educator, I welcome all students into my classroom. In third grade, students come to me from a variety of backgrounds, academic levels and experiences. My job is to meet students where they are and help them learn, grow, and develop. My aim is to help them see their own potential, become empathetic, work to be problem solvers who strive to make a positive change in the world. Student test scores show a snapshot of one day in the life of a student and not an overall picture of the teaching and learning the students have accomplished. Student test scores should not factor into a teacher’s evaluation whatsoever.

Why not? As a teacher for 16 years, I can tell you that the profession depends on collegiality and sharing of ideas. When teachers’ livelihoods depend on how students score on tests, an environment of competition and stress invades a school. Instead of seeing all the students as “our students”, teachers may strive to work with the most high achieving students. The focus is on getting students to answer test questions correctly instead of helping students learn. The arts, physical education, and inquiry based projects, like Genius Hour, get pushed to the side for more test-taking practice. Teachers see each other as competitors instead of colleagues and pressure is placed on everyone.

I realize that many people make a great deal of money off the testing industry and all the test prep guides, computer programs, and workshops. I know that some politicians like to point to test scores to push a narrative that schools are failing. I understand that there are folks who will say that test scores keep teachers accountable. But the truth is, my students’ test scores will tell you nothing about the work I do over the summer to grow as a professional, the books I read, the online discussions I participate in to gain new teaching ideas. My students’ test scores will not tell you about how I buy each of them a book twice a year so they can grow their home library and have books of their own. My students’ test scores will not tell you how I set up blogs for them so they can be writers with a wide, authentic audience. The scores won’t tell you how I plan instruction, or use formative assessment to make changes along the way, or create newsletters to keep parents informed. I believe in teacher accountability and I welcome feedback and opportunities to improve my craft. It’s just my students test scores will tell you none of this about me as a teacher and very little about my students as learners.

How do we know if teachers are teaching? How do we know if students are learning? We don’t need high-stakes tests to tell us this information. We need administrators who visit classrooms on a daily basis, noting the teaching and learning they observe. We need to look at students as learners over time with samples of work and their own thoughts and reflections on the learning. We need to have conversations about thinking and learning and encourage risk-taking and innovation for teachers and students. When everyone feels afraid to try because jobs are riding on how an 8 year old answers a multiple choice question on a random day in May, it makes you wonder what these high-stakes assessments are really accomplishing.

Response From Douglas Reeves

Douglas Reeves is the author of more than 30 books on education and leadership. He blogs at CreativeLeadership.net and Tweets @DouglasReeves:

The late Bill Sanders was the pioneer of value-added assessments, a statistical technique that purported to identify the extent to which an individual teacher could improve the test performance of students. I never doubted Bill’s integrity and good intentions. In personal conversations with me, he would assure me that the primary intent of his work for research was to identify the most effective teaching strategies, not for evaluation.

Unfortunately, Bill’s research was perverted by states and school systems with tragic consequences. Schools would publish the value-added scores of teachers, leading to at least one suicide. While Bill, a statistician by training, understood that the value of his method relied upon large sample sizes, districts and states used value-added methods with tiny sample sizes, leading to wild variations in scores - today’s “A” teacher could be tomorrow’s “F” teacher, without a single change in teaching technique. Worst of all, the implementation of the value-added model in districts with high levels of student transiency led to the “missing values” problem, in which students who were not conveniently present for three consecutive years of testing would have their scores estimated by a model that based on their demographic characteristics.

There is a broad consensus today that the value-added model lacks the fundamental properties of reliability and validity, and yet some states remain stubbornly loyal to this discredited model. I understand their desire for test-based accountability, but there is a much better way. Just have short 30-40 minute assessments administered in the fall, winter, and spring. Many districts already have this. It allows a clear observation of the progress of the same student with the same teacher within the same year. Whether a student starts at, below, or above grade-level, this sort of assessment allows leaders to identify where student progress is, and is not, occurring. Most importantly, however, teaching practices - not just test results - must the foundation of effective teacher evaluation. The international leader on this point is Kim Marshall who, amazingly, provides his observation instruments for free for schools around the world at MarshallMemo.com. Kim’s work saves money and time for administrators and teachers and is a model to follow.

Response From Timothy Hilton

Timothy Hilton currently teaches high school Social Studies in South Central Los Angeles, and has taught in the area for the past 9 years. Timothy has experience teaching every level of social studies ranging from Advanced Placement to English Language Development. In addition to teaching in the inner city of Los Angeles, Timothy is currently a doctoral student at Claremont Graduate University studying Educational Policy, Evaluation, and Reform:

Testing is a part of education. Teachers use tests to gauge student initial understanding, check for understanding as lessons progress, and assess student learning at the end of a unit or course. Give the important role testing has in the education of our students, the hot button question is what role should student test scores take in assessing our teachers.

Like any controversial topic, there are two sides to the testing as evaluation argument. One the one hand, test scores are generally accepted as a decent gauge of student learning. If a student does well on a test, there is a reasonable assurance that the student has learned at least some of the content. Continuing with this logic, it can reasonably be assumed that a good teacher will produce gains in student test scores.

On the other hand, teaching is nearly universally accepted by all those in the classroom and on school sites as highly nuanced. Teachers will make the argument, that the education of a student is about far more than just learning content. Teachers support students’ social, and emotional development. Teachers work to make students aware of the world around them, and develop empathy and compassion. None of which can be measured on a test.

So, which is it? Do we use test scores, or rely on observations to determine teacher quality?

Why not both?

No self-respecting teacher would use a single student grade on a single assignment as a final grade for the entirety of a course, so why would we rely on one source of information in the determination of a teacher’s overall quality? The more data that can be provided, the more accurate the teacher evaluation decisions will end up being. Teacher evaluations should incorporate as many pieces of data as possible. Administration observation, student surveys, student test scores, professional portfolios, and on and on. The more data that is used, the more accurate the picture it will paint.

While the addition of multiple data sources opens the door for some very creative evaluation techniques, I would argue in favor of using a balanced evaluation of teachers where teacher observations (25%), test scores (25%), student survey (25%), and a professional portfolio (25%) to evaluate teachers. This, however, is just one potential solution where test scores can be incorporated without being overly relied upon.

Response From Amanda Koonlaba

Amanda Koonlaba, Ed. S. is an educator with over 12 years of experience teaching both visual art and regular education. She is a published author and frequent speaker/presenter at education conferences. Amanda was named the Elementary Art Teacher of the Year for the state of Mississippi in 2016. She holds an Elementary and Middle Childhood Art certification from the National Board for Professional Teaching Standards. Amanda is on a mission to ensure every student in America has access to a high-quality arts-based education. You can connect with her at Party in the Art Room or on Twitter:

I might be a radical, but I think test scores should have minimal to no impact on teacher evaluations. (Sigh. Yeah, I’m a radical.)

I think test scores should be used to guide instruction. When a student doesn’t master something, it should be retaught. That is the proper use of assessment, in my opinion.

I think using test scores in evaluations puts too much performance stress on the students. I mean, their teacher’s job is on the line when that is how test scores are used. Students are aware of this. It also makes teaching be about the tests. (GASP!)

I think teaching should be about learning. I also think there should be many pieces to a teacher evaluation. If test scores are a part of evaluation, the weight on the overall evaluation score should be minimal. I really don’t want my child’s teacher to have their career on the line because of my child’s test scores. I also don’t want my children to be burdened with this either. I want them to be assessed so that they can learn. If they don’t master a skill, I want them to be retaught. That’s it. That’s pretty much all I want for my kids regarding assessment.

I worked on a collaborative report for the National Education Association called Changing the Story: Transformation toward Fair Accountability and Responsibility in Public Education. This was a project where NEA members were able to discuss accountability. This was in an online forum with prompts. After a discussion period, a core group of members worked on compiling the information into a recommendation report. (Check it out. It is an interesting read.)

One of the recommendations was to exclude standardized test scores from playing any role in a teacher’s evaluation. I remember how hard we fought to get that in there. We wrote it and were asked to rethink it. This was a defining moment for me as a professional. This is when I learned about the movement to privatize public education. I was awakened to how the testing industry plays a role in that. (It is a convoluted issue that I cannot explain here. There isn’t enough space.) Anyway, I wrote about the accountability report experience for Anthony Cody’s Living in Dialogue. I mention all of this to provide a frame of reference because one piece of that accountability report sticks out in my mind as I respond to this question about test scores and teacher evaluations. It reads “Standardized tests might determine whether a student has learned basic content, but they cannot measure the depth of understanding or vastness of creativity found in an engaged classroom. They provide no opportunity to demonstrate multi-level learning or measurement for intuition, innovation, or character. In schools all across the country teachers who are well-regarded by students, parents, principals, and colleagues were rated ineffective after student test scores were included in the VAM statistical algorithm. The American Statistical Association has urged states and school districts against using VAM to make personnel decisions.”

VAM stands for Value-Added Models. As stated, these are highly complicated mathematical formulas that attempt to measure a teacher’s impact on student achievement apart from the other factors that impact student achievement. These try to isolate the teacher’s influence on achievement. Statisticians have warned that this doesn’t work. Yet, there are so many states still using this to evaluate teachers. It just doesn’t make sense to me. It is too experimental and ineffective for me to feel it is appropriate to use this in evaluating teachers. Teachers are human beings. This is a matter of the livelihoods of human beings and their families. We shouldn’t be basing this on evaluations weighted heavily with test scores.

Response From Erin Scholes

Erin Scholes is beginning her 13th year of teaching middle school students. She currently teaches 7th grade students math in northern Connecticut. She is an advocate for young adolescents:

Using test scores to evaluate teachers devalues teachers, their students and what occurs in the classroom. When asked what I teach, I respond “I teach 130 seventh grade students.” My role as a teacher stretches far beyond the math standards. For the students, their learning happens far beyond the classroom walls.

A test measures a specific set of knowledge, at a single moment in time, in a very specific way. Each section of this statement raises its own set of concerns. By testing a “specific set of knowledge,” (usually math and language arts) we are expecting all students to be at the same place in their understanding of the concepts. When only looking “at a single moment in time” how do we know that was a good day for that student to take a test? What if they didn’t have breakfast or had an argument that morning? There are so many factors that influence a student’s life, none of which are taken into consideration when looking at a test score, and most of which are out of the control of the teacher, whom will then be evaluated based on that score. Tests are “a very specific way,” and are only one way, of examining what a student knows and yet they are the “preferred” method of evaluation. There are children who don’t perform well on tests, however that doesn’t mean they don’t understand the material.

By putting an emphasis on language arts and math we lose the opportunity to provide a well rounded education that celebrates multiple forms of intelligence and the connections among the subjects. It also creates an inequitable evaluation process, that either requires all teachers be evaluated on standardized testing or each teacher create/use their own subject specific assessment. The former puts a lot of pressure on math and language arts, and the latter requires a lot of testing. When class time is limited, especially in classes like art, music and physical education, we shouldn’t be asking teachers and students to use that time to test and write. Rather we need to provide them with more opportunities to be active and creative.

If a student’s social, emotional or physical needs are not met, they are not going to retain new learning. It is important to build relationships with students, in order to better understand their personal struggles both inside and out of the classroom. It is important to help students take responsibility and ownership for their actions, and use mistakes as learning opportunities.

At the end of the year, I ask my students “what is the most important lesson Ms. Scholes taught you?” I will get a few responses like “dividing fractions” or “how to solve an algebraic equation.” Most often the vast majority of the responses are “it is ok to make mistakes,” “how to study” or “that math can be fun.” The statements that mean the most are those that come from students who struggled during the year, and probably didn’t perform as well on tests. Like those students who are trying to figure out who they are while dealing with social and family issues. When they write, “Ms. Scholes taught me it is ok to be myself” ... that is what teaching is about. To be perfectly honest, I don’t care if those students remember a thing about ratios, I know they will continue to grow and be successful because they learned how to love themselves. Those are lessons that can’t be tested, or quantified, yet they are the ones that matter most. Those are the lessons that make a teacher “Exceptional.”

Responses From Readers

Entire evaluation of teacher must not be based on the test scores of pupils. It can be one variable out of many. Some have false belief that “all is good if marks are good.” Such an approach has negative consequences on students’ academic and teachers’ professional life.
-- ALI_ELT (@M_Ali_Linguist) August 9, 2018

Diagnosis, nothing else.
-- @cgilKMGR (@cgilKMGR) August 8, 2018

Potentially, but the biggest fallback I see is that it makes teachers compete against each other instead of help each other.
-- JulyDa (@julydamusic) August 8, 2018

Thanks to David, Kathleen, Douglas, Timothy, Amanda, and Erin, and to readers, for their contributions!

Please feel free to leave a comment with your reactions to the topic or directly to anything that has been said in this post.

Consider contributing a question to be answered in a future post. You can send one to me at lferlazzo@epe.org. When you send it in, let me know if I can use your real name if it’s selected or if you’d prefer remaining anonymous and have a pseudonym in mind.

You can also contact me on Twitter at @Larryferlazzo.

Anyone whose question is selected for this weekly column can choose one free book from a number of education publishers.

Education Week has published a collection of posts from this blog, along with new material, in an e-book form. It’s titled Classroom Management Q&As: Expert Strategies for Teaching.

Just a reminder--you can subscribe and receive updates from this blog via email or RSS Reader. And, if you missed any of the highlights from the first six years of this blog, you can see a categorized list below. They don’t include ones from this current year, but you can find those by clicking on the “answers” category found in the sidebar.

This Year’s Most Popular Q&A Posts

Classroom Management Advice

Race & Gender Challenges

Implementing The Common Core

Best Ways To Begin The School Year

Best Ways To End The School Year

Student Motivation & Social Emotional Learning

Teaching Social Studies

Project-Based Learning

Using Tech In The Classroom

Parent Engagement In Schools

Teaching English Language Learners

Education Policy Issues

Differentiating Instruction

Math Instruction

Science Instruction

Advice For New Teachers

Author Interviews

Entering The Teaching Profession

Administrator Leadership

Teacher Leadership

Relationships In Schools

Professional Development

Instructional Strategies

I am also creating a Twitter list including all contributers to this column.

Look for the next question-of-the-week in a few days.

The opinions expressed in Classroom Q&A With Larry Ferlazzo are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Classroom Q&A

With Larry Ferlazzo

Response: The Role of Student Test Scores in Teacher Evaluations

Sign Up for EdWeek Update