Guest post by Monty Neill.
The U.S. stands alone among economically developed nations in its excessive use of high-stakes standardized testing. While the U.S. mandates testing in grades 3-8 and once in high school, with some states or districts requiring more, other nations range from almost no testing (e.g., Finland and Wales) to testing in about three grades. In many nations, the focus is on essay questions, not multiple-choice, and multiple measures are common. Only the U.S. uses student test scores to judge schools and teachers. As a result, tests largely control curriculum and instruction, with harmful consequences and scant progress in general improvement or in closing inequitable learning outcomes.
Across the nation, a rebellion is brewing against testing overuse and misuse. But just saying “no” isn’t enough. In fact, high-quality feedback from assessment is vital to teaching and learning. Students, teachers and parents need to know whether kids are making progress. Communities and taxpayers deserve to know if schools are serving children well and children are succeeding. To win change, activists must offer proposals for better assessment systems coupled with demands to end harmful practices.
FairTest, along with many education, civil rights, disability, religious groups and others, supports a new approach to assessment and evaluation. It would include limited use of low-stakes standardized testing, a school quality review process, and reliance on school and classroom-based evidence of student learning. The controversial part of this is using student classroom work to determine student progress or school quality. However, evidence shows that what students actually do in class is a sound basis for evaluation. For example, student grades are better predictors of student college success than are SAT or ACT test scores. Here are two examples of how to use student work in a valid and reliable evaluation process.
The New York Performance Standards Consortium has a variance from all but one state standardized test for its 28 high schools. Instead, students must complete performance tasks in language arts, math, science and history as part of their graduation requirements. The students defend their projects orally before a committee of teachers and outside experts. Importantly, each student works with the teachers to determine the specific tasks they complete. Thus the tasks are as diverse as the students and their interests. It is the scoring guide (or “rubric”) that ensures consistency in evaluation across the range of work.
The results have been powerful. The 26 Consortium schools in New York City are demographically similar to the city as a whole. But their graduation rates, in general and for groups such as students with disabilities and English language learners, are significantly higher; more graduates attend college; and rates of staying in college into year three exceed the national average. This network of nearly 30 schools is larger than most U.S. districts, indicating its approach can be brought to scale.
Another example, the Learning Record, is a carefully structured tool for guiding a process in which teachers observe students, gather and evaluate their work, and determine their progress in language arts. The Record includes multiple sources of evidence of student learning over time (true “multiple measures”). But are the teacher judgments fair and accurate? Over the years, random samples of Records from a variety of schools, including the Bureau of Indian Affairs, were gathered and re-scored blind. Inter-rater reliability between re-scoring and initial classroom teachers was commonly between .7 and .8. This is strong evidence of the consistency of teacher judgments in evaluating student learning status and progress. Further, participating educators viewed the re-scoring process (“moderation”) and the feedback to the classroom teachers as excellent professional learning.
In both cases, large numbers of schools with diverse populations were involved in a careful process of evaluating student learning. The assessments, rooted in actual curriculum and instruction, provided trustworthy evidence.
However, there is a major barrier to moving to better assessments: schools are flooded with testing even as many face tight budgets. It is a daunting task to promote performance assessments at a time when the stakes attached to standardized tests are so high.
Still, many teachers employ strong assessments, but it is rarely systematic or likely not shared with colleagues. Such quality work can provide a good starting point. In Nebraska, before NCLB, the state embarked on a statewide system of local, teacher-led assessments. In just a few years, teachers learned to craft technically reliable performance tasks reflecting meaningful student work. As Chris Gallagher explained in Reclaiming Assessment, teachers collectively constructed this system and thereby greatly strengthened their own collaborative work on curriculum and instruction.
It should be possible for school systems, or networks of schools, to begin similar processes. Until we beat back the testing craze, schools will have to face those mandates. As discussed in part I, removing local mandates and the teacher-evaluation components may well be most important for creating space to develop a performance assessment system, though in the end federal law and policies must fundamentally change.
From outside the schools, test reformers can support school- and district-based improvement efforts and work for a system of performance assessments and classroom-based evidence. They need to do so to answer opponents’ inevitable question, “What would you use instead of standardized tests?” The examples from home and abroad disprove the idea that we need standardized tests to know how well students are progressing.
Testing reformers, then, need clear goals, a strong strategy that includes promoting authentic assessment, and effective tactics. In my next post I will discuss a few such tactics now used across the U.S.
What do you think of these models of authentic assessment? Can they help show us a different way to measure and demonstrate student learning?
- Monty Neill is Executive Director of FairTest.
The opinions expressed in Living in Dialogue are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.