Opinion
Teaching Profession Opinion

Probing the Science of Value-Added Evaluation

By R. Barker Bausell — January 15, 2013 6 min read
  • Save to favorites
  • Print

Value-added teacher evaluation has been extensively criticized and strongly defended, but less frequently examined from a dispassionate scientific perspective. Among the value-added movement’s most fervent advocates is a respected scientific school of thought that believes reliable causal conclusions can be teased out of huge data sets by economists or statisticians using sophisticated statistical models that control for extraneous factors.

Another scientific school of thought, especially prevalent in medical research, holds that the most reliable method for arriving at defensible causal conclusions involves conducting randomized controlled trials, or RCTs, in which (a) individuals are premeasured on an outcome, (b) randomly assigned to receive different treatments, and (c) measured again to ascertain if changes in the outcome differed based upon the treatments received.

The purpose of this brief essay is not to argue the pros and cons of the two approaches, but to frame value-added teacher evaluation from the latter, experimental perspective. For conceptually, what else is an evaluation of perhaps 500 4th grade teachers in a moderate-size urban school district but 500 high-stakes individual experiments? Are not students premeasured, assigned to receive a particular intervention (the teacher), and measured again to see which teachers were the more (or less) efficacious?

A value-added analysis constitutes a series of personal, high-stakes experiments conducted under extremely uncontrolled conditions."

Granted, a number of structural differences exist between a medical randomized controlled trial and a districtwide value-added teacher evaluation. Medical trials normally employ only one intervention instead of 500, but the basic logic is the same. Each medical RCT is also privy to its own comparison group, while individual teachers share a common one (consisting of the entire district’s average 4th grade results).

From a methodological perspective, however, both medical and teacher-evaluation trials are designed to generate causal conclusions: namely, that the intervention was statistically superior to the comparison group, statistically inferior, or just the same. But a degree in statistics shouldn’t be required to recognize that an individual medical experiment is designed to produce a more defensible causal conclusion than the collected assortment of 500 teacher-evaluation experiments.

How? Let us count the ways:

• Random assignment is considered the gold standard in medical research because it helps to ensure that the participants in different experimental groups are initially equivalent and therefore have the same propensity to change relative to a specified variable. In controlled clinical trials, the process involves a rigidly prescribed computerized procedure whereby every participant is afforded an equal chance of receiving any given treatment. Public school students cannot be randomly assigned to teachers between schools for logistical reasons and are seldom if ever truly randomly assigned within schools because of (a) individual parent requests for a given teacher; (b) professional judgments regarding which teachers might benefit certain types of students; (c) grouping of classrooms by ability level; and (d) other, often unknown, possibly idiosyncratic reasons. Suffice it to say that no medical trial would ever be published in any reputable journal (or reputable newspaper) which assigned its patients in the haphazard manner in which students are assigned to teachers at the beginning of a school year.

• Medical experiments are designed to purposefully minimize the occurrence of extraneous events that might potentially influence changes on the outcome variable. (In drug trials, for example, it is customary to ensure that only the experimental drug is received by the intervention group, only the placebo is received by the comparison group, and no auxiliary treatments are received by either.) However, no comparable procedural control is attempted in a value-added teacher-evaluation experiment (either for the current year or for prior student performance) so any student assigned to any teacher can receive auxiliary tutoring, be helped at home, team-taught, or subjected to any number of naturally occurring positive or disruptive learning experiences.

BRIC ARCHIVE

• When medical trials are reported in the scientific literature, their statistical analysis involves only the patients assigned to an intervention and its comparison group (which could quite conceivably constitute a comparison between two groups of 30 individuals). This means that statistical significance is computed to facilitate a single causal conclusion based upon a total of 60 observations. The statistical analyses reported for a teacher evaluation, on the other hand, would be reported in terms of all 500 combined experiments, which in this example would constitute a total of 15,000 observations (or 30 students times 500 teachers). The 500 causal conclusions published in the newspaper (or on a school district website), on the other hand, are based upon separate contrasts of 500 “treatment groups” (each composed of changes in outcomes for a single teacher’s 30 students) versus essentially the same “comparison group.”

• Explicit guidelines exist for the reporting of medical experiments, such as the (a) specification of how many observations were lost between the beginning and the end of the experiment (which is seldom done in value-added experiments, but would entail reporting student transfers, dropouts, missing test data, scoring errors, improperly marked test sheets, clerical errors resulting in incorrect class lists, and so forth for each teacher); and (b) whether statistical significance was obtained—which is impractical for each teacher in a value-added experiment since the reporting of so many individual results would violate multiple statistical principles.

Of course, a value-added economist or statistician would claim that these problems can be mitigated through sophisticated analyses that control for extraneous variables such as (a) poverty; (b) school resources; (c) class size; (d) supplemental assistance provided to some students by remedial and special educators (not to mention parents); and (e) a plethora of other confounding factors.

Such assurances do not change the fact, however, that a value-added analysis constitutes a series of personal, high-stakes experiments conducted under extremely uncontrolled conditions and reported quite cavalierly.

Hopefully, most experimentally oriented professionals would consequently argue that experiments such as these (the results of which could potentially result in loss of individual livelihoods) should meet certain methodological standards and be reported with a scientifically acceptable degree of transparency.

And some groups (perhaps even teachers or their representatives) might suggest that the individual objects of these experiments have an absolute right to demand a full accounting of the extent to which these standards were met by insisting that students at least be randomly assigned to teachers within schools. Or that detailed data on extraneous events clearly related to student achievement (such as extra instruction received from all sources other than the classroom teacher, individual mitigating circumstances like student illnesses or disruptive family events, and the number of student test scores available for each teacher) be collected for each student, entered into all resulting value-added analyses, and reported in a transparent manner.

A version of this article appeared in the January 16, 2013 edition of Education Week as Putting Value-Added Evaluation to the (Scientific) Test

Events

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
College & Workforce Readiness Webinar
The Road to Opportunity: Making CTE Accessible for All
The most valuable CTE happens off campus. For too many students, transportation is the barrier that keeps opportunity out of reach.
Content provided by HopSkipDrive
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Recruitment & Retention Webinar
New Hire, No Laptop, No Login: Preventing Day-One Disruption
What happens before day one matters. Discover how districts are improving the new hire experience.
Content provided by Frontline Education
Teaching Profession K-12 Essentials Forum Supporting the New K-12 Workforce: What Teachers Need to Stay at School
 Join this free virtual event to discover what teachers say they need to feel supported to stay in classrooms for the long haul.

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Teaching Profession How These Schools Use Teams to Cut Teacher Workloads
California teachers in the co-teaching pilot are reporting higher morale.
4 min read
As districts nationwide experiment with strategic staffing—an attempt to use teachers’ time in different ways to free up collaboration and reduce class size. Strategic staffing—in which schools give schedule flexibility and sometimes differentiated pay for teams of classroom educators—has gained ground in many states as a way to provide more professional development for young teachers and retain educators longer. PICTURED, Students at Whittier Elementary School work in groups and independently, Tuesday, Oct. 18, 2022 in Mesa, Ariz.
Strategic staffing—in which schools give schedule flexibility and sometimes differentiated pay for teams of classroom educators—has gained ground in many states as a way to provide more professional development for young teachers and retain educators longer. Students and teachers at Whittier Elementary School in Mesa, Ariz., work in groups and independently, Tuesday, Oct. 18, 2022.
Matt York/AP
Teaching Profession More Teachers Name Classroom Management as a Job Stress Than Low Pay
A national survey highlights ongoing work and home pressures on educators.
3 min read
Teachers follow each other in a circle during a workshop helping teachers find a balance in their curriculum while coping with stress and burnout in the classroom, on Aug. 2, 2022, in Concord, N.H. School districts around the country are starting to invest in programs aimed at address the mental health of teachers. Faced with a shortage of educators and widespread discontentment with the job, districts are hiring more therapist, holding trainings on self-care and setting up system to better respond to a teacher encountering anxiety and stress.
Teachers follow each other in a circle during a workshop helping teachers cope with stress and burnout in the classroom, on Aug. 2, 2022, in Concord, N.H. New data show that teachers continue to face high levels of stress, but many plan to stay in the profession long term.
Charles Krupa/AP
Teaching Profession Opinion We Can’t Give Up on Teacher Diversity
Many efforts to recruit Black teachers leave out a crucial element.
5 min read
Serious young Afro-American teacher in casual shirt standing in front of projection screen and presenting a lesson in class.
Education Week + iStock
Teaching Profession Beach Reads, Not PD: Teachers Set Summer Boundaries
Many teachers plan to avoid summer PD reading, choosing rest and relaxation instead.
1 min read
Illustration of a book, sunglasses, and symbols of romance books, PD, travel, mystery, and adventure.
Collage by Education Week