The problem with multiple measures is that reasonable people and, in fact, many assessment experts disagree on the basic definition of what they are.
As high stakes become a threat and, finally, a reality for many students in this country, the critics’ outcry about the evils of testing and the inequity of stakes intensifies. Some would like to abolish the entire standards-based reform strategy and revert to the good old days when all standards, if they existed, were local. Some don’t like any form of testing. Others find the tests acceptable, but are in favor of abolishing the stakes while retaining the test for diagnostic purposes. Some would support standards and stakes but only if the assessment system was broadened to include “multiple measures.” Proponents of “multiple measures” generally want to see additional indicators of learning included in the determination of a student’s competency.
The problem is that reasonable people and, in fact, many assessment experts disagree on the basic definition of a “multiple measure.” I have challenged groups of lay people, education advocates, graduate education faculty, assessment gurus, and others to suggest what the other indicators and assessment tools used might be. The answers have been widely divergent and often fuzzy. There is no consensus on the subject of multiple measures, yet the term has become almost a mantra in the discussion about how to improve state assessments.
As a response to the current discord on assessment and stakes, the idea of multiple measures has appeal. It is positive and constructive, building on the principles embodied in the strategy of holding all students to a high standard, measuring progress, and making performance count. Policymakers are intrigued with the notion that something could be added to the assessment process to make it fuller and fairer, and thereby silence some of the shrillest critics. But policymakers can’t get very far with the multiple-measures concept until proponents become clearer about what it actually means.
Policymakers can't get very far with the multiple-measures concept until proponents become clearer about what it actually means.
In order to join this conversation in Massachusetts, the Massachusetts Reform Review Commission convened a workshop for various stakeholders who wanted to explore the multiple-measures concept. Researchers, policymakers, advocates, and practitioners came together to tackle the elusive question of “what assessment tools or indicators would you add to the current MCAS “the Massachusetts Comprehensive Assessment System—"to make it fairer and more comprehensive?”
As the chairman of the commission, I challenged the group to devise a set of proposed “multiple measures” that met six criteria that I felt were essential for viability. These included the following:
- Validity. Do the additional tools accurately measure learning embodied in the standards?
- Reliability. Will the additional measure repeatedly and accurately generate consistent results in a variety of circumstances?
- Transparency. Would its use be understandable and clear to the public, parents, and educators?
- Practicality. Is it relatively easy to compile or administer? Is it feasible for teachers and students?
- Affordability. Are the costs reasonable and affordable?
- Political Feasibility. Could any self-respecting politician stand up and support the use of this measure in public?
All of these criteria are challenging to meet. Many of them are matters of degree and subject to human judgment. For instance, how valid and reliable does an instrument or data set need to be? Despite the obvious difficulty of arriving at such determinations, I am confident that appropriate standards could be agreed in each of the areas.
In Massachusetts, we are challenging educators, policymakers, researchers, and especially critics of the current assessment system to propose some additional performance measures, some “multiple measures” that meet the criteria I have outlined. The commission’s first workshop yielded a promising beginning to a process for developing some additional measures of the kind contemplated in the state’s education reform act of 1993. We intend to invest further resources in continuing this work.
But the workshop conversations also made it apparent that, as we suspected, “multiple measures” are far easier said than done. Both locally and nationally, much more work needs to be done in articulating the principles and practice of comprehensive, effective, and useful state assessment indicators and tools.
S. Paul Reville chairs the Massachusetts Education Reform Review Commission and is the executive director of the Pew Forum on Standards Based Reform. The forum is based at Harvard University’s graduate school of education in Cambridge, Mass.
A version of this article appeared in the November 14, 2001 edition of Education Week as Multiple Measures?