(This is the second post in a three-part series on this topic. You can see Part One here.)

This week’s question-of-the-week is:

If we don’t want to use student test scores as part of a teacher evaluation, then what are alternatives?

Part One in this series included responses from American Federation of Teachers President Randi Weingarten, California Teachers Association President Dean Vogel, and 2012 National Teacher Of The Year Rebecca Mieliwocki.

Today’s Part Two features contributions from Julian Vasquez Heilig (with Lisa Hernandez), Ben Spielberg, David Berliner and Paul Bruno.

You can also listen to a ten-minute conversation I had with Julian and Ben on this topic at my BAM! Radio Show.

Response From Julian Vasquez Heilig

Julian Vasquez Heilig is a Professor of Educational Leadership and Policy Studies at California State University Sacramento. He blogs at Cloaking Inequity, consistently ranked one of the top 50 education blogs by Teach100. He wrote this post with input from with input from Lisa Hernandez. She is currently a second grade teacher in the Austin Independent School District. Ms. Hernandez is a graduate student in the Educational Policy and Planning at the University of Texas at Austin:

Some school “reformers” are determined to allocate a number to every aspect of education. No Child Left Behind accentuated the current infatuation with quantitative data in educational policy. Arne Duncan continued the quantitative love affair via teacher evaluation requirements in Race to the Top and other initiatives. As a result, many school districts throughout the United States are under pressure to meet various teacher evaluation requirements that exist due to local, state, and federally implemented policies. However, there are empirically tested qualitative alternatives to the current quantitatively based teacher accountability and evaluation systems. These alternatives can access the expertise of master teachers instead of a number cruncher sitting in front of a computer screen.

Peer Assistance and Review (PAR) was developed in the early 1970s in Toledo, OH. Harriet Sanford wrote that Peer Assistance and Review (PAR) is a program of structured mentorship, observation and rigorous, standards-based evaluation of teachers by teachers, and has been demonstrated to be among the strongest ways to develop great teachers. What is PAR and how does it work? In PAR, the local teacher organization and district administrators jointly manage the teacher evaluation program to improve teacher quality by developing a structure where expert teachers mentor and evaluate their peers.

The Harvard Project on the Next Generation of Teachers posited,

“PAR challenges most people’s expectations about what teachers and principals should do. It requires unusual collaboration between the union and administration. It must be grounded in a systematic approach to teacher evaluation... Increasingly, policymakers, district officials, and union leaders have pointed to PAR as a promising component of an effective human capital strategy, thus fueling interest and initiatives across the country.”

Dissimilar to relatively inexpensive teacher evaluation models that rely heavily on unstable Value-Added models, PAR is an expensive reform, costing $4,000 to $10,000 per teacher served. However, a Harvard study demonstrated that PAR affords the district “a range of financial savings and organizational benefits that offset program costs.” The benefits are accrued in two primary ways. First, the mentoring components of PAR help teachers succeed and avoid burnout, thus, increasing retention. Second, PAR, as a collaboration between the district and teachers, helps ineffective tenured teachers improve or they are subject to dismissal “without undue delay and cost because of the program’s clear assessment process and the labor-management collaboration that underpins it.”

The peer review and distributed leadership processes in PAR also empowers teachers to construct their own understanding and knowledge of the world, through experience and reflection upon those experiences. The expert consulting teachers (CTs) in PAR also creates specialized roles within the teaching profession where they are responsible for mentoring and evaluating peers which differentiates the work and future career opportunities of teachers.

In summary, when creating teacher evaluation systems, the primary data should be from veteran teachers that provide a variety of expert perspectives. PAR is not experimental and has been demonstrated successfully in districts across the United States (See the NEA Foundation’s Online PAR Training module #11 here). This constructivist approach identifies and fosters high quality educators by producing useful qualitative and quantitative data for efficacious teacher evaluation. A primary reason so many districts are flummoxed by issues relating to accountability and teacher evaluation systems is due to the fact that they are being forced to rely upon “junk science” statistical modeling that utilizes non-generalizable data from test scores. As an alternative, PAR is a locally-based approach in which veteran teachers create qualitative and quantitative data that is actually meaningful for sound teaching practice.

Response From Ben Spielberg

A Teach For America alum, Ben Spielberg has worked as a math instructional coach for middle and high school teachers and has spent the last two years on the Executive Board of the San Jose Teachers Association. Ben holds a B.S. in Mathematical and Computational Sciences from Stanford and blogs at 34justice.com:

Teachers unions and most reform organizations actually agree that the primary purpose of teacher evaluation is to promote teacher growth. Making student test scores a defined percentage of teacher evaluations unfortunately undermines this purpose; probability theory and a large body of research suggests that judging teachers based on student outcomes may, ironically, decrease the likelihood of better student outcomes in the long run. Teacher evaluation systems (and evaluation systems in any other profession, for that matter) should instead focus on the teacher’s “locus of control” - the actions the teacher takes in pursuit of better student outcomes. This approach increases the long-term likelihood of student success because it more accurately captures a teacher’s contribution to student learning and gives teachers more actionable feedback. It is also much fairer than alternative systems.

The innovative new evaluation system negotiated by San Jose Unified School District (SJUSD) and the San Jose Teachers Association (SJTA) is one example of this more promising approach to evaluation. The teacher and administrator co-chair and two teacher members of SJUSD’s newly formed Teacher Quality Panel (TQP) recently defined the following five standards for effective teaching:

1) Teachers create and maintain effective environments for student learning.

2) Teachers know the subjects they teach and how to organize the subject matter for student learning.

3) Teachers design high-quality learning experiences and present them effectively.

4) Teachers continually assess student progress, analyze the results, and adapt instruction to promote student achievement.

5) Teachers continuously improve and develop as professional educators.

Modeled after the California Standards for the Teaching Profession, these standards are accompanied by more specific performance criteria, descriptive examples and non-examples of what successful execution might look like, and sample sources of data evaluators might consider in addition to classroom observations. Teachers who meet standard #2, for example, are strong unit and lesson planners. They anticipate sources of student confusion and identify possible misconceptions before they occur. A teacher’s assignments, grading rubrics, assessments, and short- and long-term plans can all, to some extent, indicate the teacher’s skill with standard #2.

Because teachers’ instructional delivery lies at the heart of standards #1-#4 (and encompasses most of standards #1 and #3), SJUSD has invested time and money into improving the validity and usefulness of classroom observations. SJUSD evaluators take an 8-day course called Analyzing Teaching for Student Results to improve their ability to assess a teacher’s instructional and classroom management strategies and “communicate what they have observed orally and in writing in a balanced manner.”

Teachers have access to two evaluators - one administrator and one consulting teacher who has taught similar content and/or grade levels - who must make “frequent visits with feedback and informal support” in addition to formal, full-period observations. The entire evaluation process focuses on observable, clearly defined teacher practices, collaboratively developed by the TQP, that everyone believes will benefit students. It also leaves sufficient room for teacher creativity and innovation. Whereas the use of standardized test scores as a defined percentage of teacher evaluations reduces the evaluation conversation to a narrow, often inaccurate definition of successful practice, this focus more effectively encourages teachers to “set goals, receive specific feedback about areas for professional growth, and engage with supervisors in meaningful discourse about areas of strength and [those that need] improvement.”

Input-based teacher evaluation holds several advantages over student outcome-based evaluation - it is completely within the teacher’s control, more easily tied to professional development and support, and significantly more fair. Most importantly, robust input-based evaluation is more likely than the alternative to lead to the high-quality teaching that students in every classroom deserve.

Response From David Berliner

David C. Berliner is Regents’ Professor Emeritus in the Mary Lou Fulton College of Education at Arizona State University. He is a member of the National Academy of Education, the International Academy of Education, and a past president of the American Educational Research Association. His interests are in educational psychology, the study of teaching, and educational policy:

If I took evaluation as seriously as does the air force or the airlines, and was actually willing to spend money on evaluation because I thought teaching America’s youth was as important as flying an airplane, here is what I might do.

I’d first train, and then require principals or other administrators to do approximately 5 classroom observations per year. They run the school and it ought to be the primary focus of their job to have the finest employees they can hire and retain. Collections of artifacts by these administrators is number two. That includes assignments made and teacher made tests given, essays produced by students, etc. Observation of teachers by teachers who were named in neighboring districts as outstanding teachers would be my third source of data (4-5 such observations by 4-5 different peers might be ideal). Parent surveys is the fourth source of data. This all leads to serious meetings at the end of each year to discuss all four sources of evidence in order to design individualized plans of professional development for the following year, or counseling out of the profession.

As in universities I have worked in, every district ought to have a fund that allows the buy out of a tenured teacher who is judged to be inadequate with approximately one years salary if they’ll just resign that year without a legal battle. No aggregate standardized test scores are needed to make reliable judgements about the quality of teachers in this system, which is appropriate, since teachers have huge effects on individual students, but have much less effect on the aggregate scores of classrooms, schools, districts, states and nations.

Response From Paul Bruno

Until very recently, Paul Bruno taught middle school science in Southern California. He writes about education on Twitter and on his personal website:

There are certainly reasons to be concerned about the use (and misuse) of student test scores in teacher evaluation. Still, it is worth remembering that many of the criticisms of “value-added models” - e.g., that they may be unstable or biased - could be leveled with equal justification against classroom observations.

There is also good evidence - perhaps more so than for other methods of evaluation - that value-added models of teacher effectiveness based on test scores are significantly related to long-term outcomes for students. So it would not be obviously unreasonable to incorporate test score information into teacher evaluations. Indeed, many teachers already consider such information when evaluating their own work.

Of course, even if test scores are one component, it’s doubtful they should constitute more than a modest fraction of teachers’ overall evaluations; measuring a range of other teacher inputs and outputs remains important.

Classroom observations can be a useful and important way for evaluators to get a sense for whether teachers are using appropriate methods (e.g., scaffolding) or getting results (e.g., keeping all students engaged and on-task). There are nevertheless limits to what observations can accomplish, in part because there is not a clear, comprehensive, widely-accepted set of “best practices” for teaching, and in part because much of what teachers do will not be captured in an observation and may happen outside of regular class hours anyway.

One way to supplement observations may be to move toward a system in which teachers make a more affirmative case for themselves, similar in some ways to what faculty often do in the higher education system. On this model, teachers could collect evidence of their own strengths and contributions - classroom videos, curricular materials, assessment results, documented after-school work, and so on - and bring that evidence to their evaluators.

This happens to a very small degree already, especially if teachers feel the need to defend themselves against a low score on an evaluation rubric. However, by formalizing the expectation that teachers will demonstrate and document their performance and contributions, administrators may get a better picture of whether teachers are consistently meeting expectations and teachers may be more confident that their efforts will not go unnoticed.

Finally, it is worth always keeping in mind that we do not know the “best” way to evaluate teachers, or even if there is a single best way. We do not have definitive research on the long-term effects of different evaluation systems, and all systems involve trade-offs that will vary from context to context. It therefore probably makes sense to keep an open mind and be willing to experiment with reasonable methods that satisfy general principles of fairness and transparency.

Thanks to Julian, Ben, David and Paul for their contributions!

Please feel free to leave a comment your reactions to the topic or directly to anything that has been said in this post.

