This post is by Joan Herman, co-director emeritus of the Center for Research on Evaluation, Standards, and Student Testing at the University of California, Los Angeles
It’s disturbing to read about the fragmentation of the testing landscape as states pull out of PARCC and Smarter Balanced and go back to their own tests. I understand budget limitations, and understand that cost is an issue for states. But at the same time, it’s wise to remember the old adage: “You get what you pay for.” Moreover, if history is a guide, these tests will not meet the challenge of the Common Core State Standards.
The low level of many state tests, in fact, is what motivated the federal investment in the two state consortia. Recall the data from two telling studies: RAND’s analysis of the depth of knowledge assessed in released reading, writing, and mathematics items and tests from the 17 states reputed to have the most challenging state assessments, and Norman Webb’s studies of the alignment between the Common Core State Standards in ELA and Math and four high school tests. Both studies draw on Norman Webb’s depth of knowledge (DOK) framework, where 1=rote learning; 2=simple application; 3=applications requiring abstract thinking; reasoning, synthesis; and 4=extended problem solving. Bob Linn and I have argued that DOK3 and DOK4 are important aspects of deeper learning.
Results from the RAND study showed that virtually all of the selected- and constructed-response items in mathematics were categorized as DOK1 or DOK2. In reading and writing, the situation was similar for selected-response items, but constructed response tasks fared better: more than half the constructed-response reading tasks were at or above DOK3, and for those states that had writing assessments--only 8 of the 17--the writing prompts were nearly uniformly classified at DOK3 or DOK4. The study makes clear the association between task type and depth of knowledge assessed: it takes constructed response items and extended writing or performance tasks to achieve the highest levels of DOK. Yet it is exactly these types of items that are expensive relative to test time and scoring costs.
Norman Webb’s findings may be particularly apropos to those states that are planning to stick with their existing high school tests. The studies examined the correspondence between ACT, SAT, PSAT, and PLAN and ELA standards for grades 9-10 and 11-12 and with the high school math standards for grades 9-12.
At the most general level, the study found that both ACT and SAT had at least six items assessing the majority of strands in the ELA standards (reading for literature; reading for information; writing; language; reading for literacy in history/social studies; writing for literacy in history/social studies, science, and technical subjects; and reading for literacy for science and technical subjects)--6 of 7 and 5 of 7 for the two tests, respectively. Absent a writing sample, PLAN addressed for five strands, while PSAT addressed only three. Across all exams, the language strand drew relatively the largest proportion of items, and coverage at the more specific level of individual standards was spotty. Further, Webb and his reviewers noted that all tests tended to assess students on the more general and lower level skills of CCSS ELA standards and not on the more complex expectations of analysis and making inferences. These impressions were mirrored in DOK ratings, which averaged 1.6 and 1.9 for ACT and SAT, respectively.
Results were similar in mathematics. Both the ACT and SAT were found to address all five conceptual categories defined by the standards: Number and Quantity, Algebra, Functions, Geometry, and Statistics and Probability. Slightly more than half the items in each exam specifically addressed one or more standards in the high school standards, while approximately a third of the items were deemed middle school. ACT addressed roughly a quarter of the high school standards, while SAT addressed approximately 20 percent of them. Coverage was less for PLAN and PSAT, which provided at least six items for only two of the five conceptual categories: Algebra and Geometry, but not Functions or Statistics and Probability. Application of mathematical practices was virtually absent from all four exams.
In short, the level of alignment between these four high school tests and the Common Core State Standards is weak. David Coleman has essentially confirmed this observation in announcing plans for the re-design of the SAT.
How can we help make sure that states will move forward to new assessments that capture the spirit and content of the Common Core? We have criteria that should guide state decision making. For example, a paper written by nearly two dozen assessment experts, entitled Criteria for High Quality Assessment, lays out five standards to guide decision making:
- Assessment of Higher-Order Cognitive Skills
- High-Fidelity Assessment of Critical Abilities
- Standards that Are Internationally Benchmarked
- Use of Items that Are Instructionally Sensitive and Educationally Valuable
- Assessments that Are Valid, Reliable, and Fair
The Council of Chief State School Officers also has laid out requirements that states should use in their RFPs for standards aligned assessments.
How can we help to assure that states actually use this guidance?