Authentic Assessments and the True Multiple-Measures Approach
The best sleuths and scientists know the value of multiple measures. They insist on them to solve mysteries, garner information, and give meaning to raw data.
"When you follow two separate chains of thought, Watson, you will find some point of intersection which should approximate the truth,'' Sherlock Holmes said to his biographer in The Disappearance of Lady Frances Carfax.
And, when Marie Curie, Holmes's Continental contemporary, discovered that an unknown element in uranium ore was radioactive, she used multiple measures to determine the source of the radioactivity. After processing huge amounts of ore, Madame Curie obtained the first known pure sample of radium, and then she measured its chemical properties to prove that it was in fact a new element.
The point is as basic as it is important: Multiple measures are the only sure way to get valid, reliable, and fair results. But apparently the point is too simple, or it makes too much sense to be taken seriously by many education policymakers as well as some lawmakers on Capitol Hill and in many statehouses across the country. For what I am hearing in the continuing debate about how our nation's students should be tested is talk of the wholesale abrogation of one type of test over another. And, because a substantial portion of the often acrimonious debate over testing still seems to get stuck on norm-referenced, multiple-choice tests (a common format that has been used for many years) versus performance assessments (a newly resurrected test format), guess which type of tests are pitched out the window? You got it. The one with the more common format.
Before going any further, let me say something to those who have either made up their minds about which side of the fence they stand behind in the norm-referenced-test/performance-assessment debate or who have read my credentials at the end of this article and are trying to figure out what kind of soap I'm selling. I'm not trying to sell anything except some common sense. The test publisher for which I work researches, designs, sells, and implements testing programs based on both norm-referenced tests and performance assessments, as do most major test publishers.
Many norm-referenced tests are outcome based. They provide cost-efficient, valid, reliable, and fair data that school districts need to make important program decisions regarding curriculum and resources, and they help serve as an accountability tool. Results from norm-referenced tests also often serve as community report cards and enable local and state educators--the people charged with the job of improving student achievement--the chance to see how their students compare to children in other communities and states.
They are not the only measure; but they are one important measure. Such tests can help monitor schools' progress and make them more accountable. They can also evaluate the effectiveness of instruction and focus curriculum. Norm-referenced, multiple-choice tests provide data that, in the absence of national standards, compares students against a norm. But, they are one, and only one, outcome measure. Performance-based, or "authentic,'' assessments can also play an important role in the assessment matrix in which multiple measures are needed to accurately assess students.
The debate should not be about one measure over another, or new versus old. What Madame Curie, Sherlock Holmes, and a host of past and present scientists, sleuths, thinkers, and educators know is this: No test, no matter how good it is, should be the sole criterion. They know that various tests are designed for different purposes, and they know that you sometimes have to develop your own measures to complement reliable, standardized measures. No single test or assessment can tell us everything we need to know about a student. Multiple measures are needed to determine a student's progress and achievement. Some tests, for example, are designed to indicate whether students need additional work in specific subjects, while others measure overall group progress toward broadly stated goals.
And it is precisely here that the policy debate over a single measure--norm-referenced, multiple-choice tests versus performance assessments--begins to seem silly. For in many communities across this country, educators are doing the right thing by using performance assessments while at the same time building on the strengths of norm-referenced tests as part of a multiple-measure approach. In doing so, they are opening a new era in assessment that symbolizes the dynamics that are possible when collaboration and local control of testing are allowed to flourish.
States and local school districts are collaborating with major test publishers to design assessments that fit each community's needs and that best reflect local education objectives and the national education goals. Test publishers are not the enemy when it comes to assessment. They are working with communities to help them create their own unique education systems--systems that will graduate productive and fulfilled citizens for the next century.
For instance, in the Baltimore city schools this autumn, approximately 50,000 students in grades 2-5 will not only be able to collaborate on tasks geared toward Baltimore's new geographically based curriculum, but they will also assist in the scoring and understand the reasoning behind the test. Their teachers are helping to develop the assessment tasks and they are ultimately responsible for scoring the tests. They will also decide when the tests should be administered and how the evaluation information should be used.
Twenty-four of the best teachers in Baltimore recently got together with testing officials to debate, design, and collaborate on the assessment tasks. Is the reading level appropriate? Is the literature on which the tasks are based readily available in Baltimore classrooms? Are the tasks related to the children's classroom experiences and/or everyday lives? For instance, one assessment might involve a problem-solving task about the Baltimore harbor or about Oriole Park at Camden Yards, the city's new baseball stadium. Or, a science task might ask students to determine why crabs are indigenous to Maryland. These are but a few of the areas teachers are delving into as part of this new program. For them, having a voice about how and when the assessments are used creates a sense of empowerment that can be found in very few school districts. And, because the tests are not "high stakes,'' teachers do not have the anxiety and fear that is often associated with programs where they are held accountable for student test results.
The tasks, to be given throughout the year, complement and support the activities students will find on the statewide assessment that is taken at the end of the year. This provides students time to familiarize themselves with the performance-assessment tasks, helps them become confident about their abilities, and creates a sense of ownership around their educational progress. Portfolios of the students' work are collected by the teacher, so individualized assistance can be provided before the state test begins. That test, by the way, is a performance assessment administered as part of the Maryland School Performance Assessment Program. It in turn is balanced by norm-referenced tests. That's a multiple-measures approach.
Or, take a look at Redwood City, Calif. Last spring, Redwood City, a town south of San Francisco with a high percentage of Spanish-speaking students, set on a course that, like Baltimore, will change the way assessments are created. District officials decided that the curriculum needed revision and staff development could be improved. They decided to start from scratch, and erased all the old notions of curriculum and assessment. Standards and outcome goals were established. A new curriculum was developed to meet those outcomes. And C.T.B. Macmillan/McGraw-Hill was asked to assist in developing new performance assessments for grades 2, 5, and 7 that would provide for continuous assessment of the new program.
As in Baltimore, local teachers were an integral part of the process, working side by side with testing experts in creating tasks that would have meaning for students and accurately assess how well they are learning. Math problems were created that would yield a sense of how students communicate about math; instead of just providing the answer, they also had to explain in writing how the answer was derived.
The purpose behind the assessments, like those in Baltimore, is to familiarize students with the types of tasks on the state performance tests--the California Learning Assessment System--given in grades 4,8, and 10. Teachers and local officials were the impetus behind the development of the new tests and were the driving force that turned the standards and outcome goals from mere wish-list items to a functioning program based on the use of multiple measures.
The outcomes from valid, reliable measures give educators the power and control they need at the local level to help their students and schools succeed. The axiom "Information is power'' is quite true in this instance. Teachers, principals, and administrators need a continuous loop of information about how students are performing in the classroom and what additional resources may be needed to improve instructional practices and to increase student achievement. This is especially important in both urban areas with significant populations of children with special needs or culturally diverse students with limited English proficiency and in more isolated rural locations where students have been denied high-quality education due to poverty or distance.
What communities and school districts are learning is the value of multiple measures, and local performance assessments, statewide testing programs, and norm-referenced tests together facilitate a broad range of information and reporting. They also provide school districts and states with a variety of accountability measures that no single test can provide.
The use of multiple measures is a deceptively simple idea, but one that is absent from many of our policy debates. Let us hope that as the discussions continue our policymakers will remember Baltimore, Redwood City, and the habits of our best sleuths and scientists. After all, as Sherlock Holmes remarked in A Study in Scarlet, "It is a capital mistake to theorize before you have all the evidence.''
Michael H. Kean is the vice president of public and governmental
affairs for C.T.B. Macmillan/McGraw-Hill in Monterey, Calif. He also
chairs the Test Committee of the Association of American