States have come a long way in designing standards and tests, but not far enough.
The success of standards=based school improvement depends heavily on two elements: strong standards, and assessments that measure what the standards expect. But previously unreleased research findings suggest that states still have a long way to go in making their standards good enough and in designing tests that adequately reflect those standards.
A series of analyses by Achieve--a nonprofit group in Cambridge, Mass., created by governors and business leaders to promote standards-based reform--suggests that while state standards have become clearer and more specific, many are still too vague and all-encompassing. They tend to repeat the same set of topics from grade to grade and omit some of the more challenging academic content stressed in other, high-performing countries.
Moreover, the alignment between state standards and tests is often faulty. Some tests tend to measure some standards but not others. And they often focus on less demanding knowledge and skills, rather than on the more ambitious content spelled out in the standards. While most of the tests examined by Achieve were developed by states to match their standards, a handful were commercially prepared, off-the-shelf exams.
Because the quality and alignment of state standards and assessments are so crucial for standards-based reform to work, Education Week asked Achieve to work with the newspaper's staff to summarize the group's findings for Quality Counts 2001.
"The standards themselves are stronger than when states first started developing them in the early 1990s," says Matthew Gandal, the vice president of Achieve. "The tests also are stronger than a decade ago, when there was no intention of aligning them with standards because there weren't any."
"However," he adds, "in some cases, the standards and tests are still not as strong as they need to be to move the schools in this country as far as we want them to move in the new century."
Since 1998, Achieve has worked with more than 20 states to examine both the quality of their academic standards and the match between those standards and state assessments.
In one set of studies, the group compared state standards and assessments against the expectations of other, high-performing countries. Achieve researchers used the analysis of curriculum expectations created as part of the Third International Mathematics and Science Study, a comprehensive look at performance and instructional practices in 41 countries.
In addition, Achieve conducted a more in-depth analysis of state standards and tests in nine states.
‘Mile Wide, Inch Deep’
TIMSS was the most extensive cross-national study of student achievement and educational practices ever undertaken. In addition to testing students' knowledge and skills in math and science in grades 4 and 8 and at the end of high school, researchers analyzed curricula in the participating countries. The analysis looked both at the "intended" curricula laid out in official documents and at the "enacted" curricula that teachers actually used.
The purpose of the investigation was two-fold: first, to determine the common elements of the curricula in the participating nations in order to craft a cross-national test of achievement; and second, to provide background information to help researchers understand the factors that influence learning.
U.S. students placed in the middle of the pack in most grade levels tested on the TIMSS exams, with performance slipping as students moved from elementary school to high school. The accompanying curriculum analysis attributed the undistinguished performance to a failure of most American schools to teach math and science in a coherent way that leads to in-depth understanding. The TIMSS researchers found that, compared with those in other nations, math and science textbooks in the United States were overstuffed with topics, none of which was taught in sufficient depth. In the words of the researchers, the standards in this country are "a mile wide and an inch deep." They also found that the high-performing TIMSS countries tended to have a more focused set of standards.
In 1998, Achieve commissioned William H. Schmidt, the national research coordinator for TIMSS and a professor of education at Michigan State University in East Lansing, to examine how the standards and tests in 21 states compared with those of the top-achieving countries. Using the framework created for TIMSS, Schmidt scrutinized the standards in math and science and the tests in those subjects in the 21 states. He then compared the documents with those in the nations that performed highest on the international study. The analysis considered:
- The content areas included in the state standards and tests;
- The grade levels at which the content areas are introduced and expected to be covered; and
- The proportion of test items devoted to the content included in the curricula of the best-performing nations. In grade 8, only 13 of the 21 states provided both their math standards and tests for analysis.
Achieve’s ‘Benchmarking’ Initiative
In 1998, Achieve also launched a "benchmarking" initiative to provide an in-depth analysis of standards and tests for interested states. The effort began with studies of standards and tests in Michigan and North Carolina. Since then, Achieve has conducted similar analyses for another seven states. The Michigan and North Carolina analyses examined standards and tests in four subjects: English/language arts, mathematics, history/social studies, and science. The other analyses examined only English/language arts and math.
The studies consisted of two main phases. First, experts convened by Achieve analyzed the standards documents by comparing them against "benchmarks" selected by a broader group of master teachers, academicians, and curriculum specialists as among the best in the world. In English/language arts, state standards were compared against those in California and Massachusetts; in math, the benchmark standards came from Arizona and Japan. In addition, the experts compared state standards for early literacy with those from North Carolina, Texas, and the New Standards project.
In conducting their inquiries, reviewers considered the rigor of the standards, their comprehensiveness and focus, and their clarity. Side-by-side comparisons with the benchmark documents enabled the reviewers to determine, among other characteristics, if a state's standards provided an equivalent level of guidance, and whether a state expected students to demonstrate the same knowledge and skills as the benchmark states at a similar grade level.
After examining a state's standards, the Achieve reviewers looked at the state's tests to see whether they measured what the standards called for. In that analysis, the reviewers considered the content of the test, the performance expected of students, the test's rigor, and the depth and breadth of the content being measured. In addition to allowing judgments about the quality of state tests, that review provided an important window into the quality of state standards. For example, it is difficult to gauge alignment if a state's standards are not well-developed or are too vague.
Quality of Standards
Both sets of studies found that state standards have improved dramatically in the past few years. In 1996, when only 14 states had standards in all four core subjects, reviews found that most standards lacked clarity and specificity and were not well-grounded in content. Many standards included expectations that were unmeasurable, such as students' enjoyment of reading.
Today, 49 states have standards, and the picture is far different. Achieve found that the overwhelming majority of state standards now are clear and jargon-free. For the most part, the standards also are written in measurable terms.
An Achieve study of 8th grade math standards and assessments in 13 states shows that state standards span a large number of topics and that many of the topics students are expected to master are not included on their state’s test. In one state, 8th grade math standards cover 32 core TIMSS topic areas. At the same time, 57 percent of those topics do not appear on the state test.
Moreover, states are continuing to refine their standards, with noticeable results. Last year, for example, Indiana revised it standards following an Achieve review, and the group has since described those standards as among the strongest in the nation.
But states have room for improvement. Although many state standards are clearer than before, some are still both too fuzzy and too exhaustive to provide sufficient guidance to teachers or test developers. One state, for example, expect all student to "read literally, inferentially, and critically." (Achieve did not identify states by name.)
While states may want students to demonstrate such abilities, Gandal notes, such a standard says nothing about how to craft lessons or exercises to help students accomplish the goal or how to judge if a student has achieved the desired results. Almost anything can be taught or tested under such a wide net.
In part, such vagueness may reflect states' reluctance to dictate to districts or schools how to teach. "At the beginning of the standards movement, I think many teachers and others in local school districts said, 'Stay away from the curriculum; this is our issue,'" says Gandal. "I think what we've seen over the decade, as accountability has kicked in, is that people in local districts wanted more specific standards."
Achieve has found that some of the newer standards, such as the English standard in Massachusetts and the math standard in Arizona, have done a better job of walking the line between state guidance and local control.
Those standards show that it is possible at the state level to provide clear and specific expectation for what students should learn, according to Achieve, while till accommodating multiple programs of study in school.
States also need to increase the rigor of their standards, the Achieve studies suggest. When held up to those of high-performing nations, many state standards are not a challenging as they could be.
In some states, the expectations for 8th graders are equivalent to what other countries expect elementary students to master. For example, one state expects 8th graders to apply measurement formulas that Japanese students are expected to apply in grades 4 and 5. The same state asks students to add and subtract fractions with like denominators in 5th grade; Japan expects students to add and subtract decimals and fractions with unlike denominators in 3rd grade.
Those lower expectations reflect a pattern of repetition and redundancy in many state standards. States simply include the same topic year after year, without expecting students to acquire a more sophisticated understanding of content or more complex reasoning skills over time. Because of that redundancy, the standard for students in the upper grades often are crammed with so many expectations that teachers cannot teach any of them in depth-the “mile wide, inch deep” phenomenon encountered by the TIMSS researchers.
For example, Schmidt's analyses showed that states included as many as 40 math topics in their standards for grade 8, compared with fewer than 20 in high-performing countries like Japan. Those countries expect students to master topics and move on. The Japanese standards do not even mention measurement after 5th grade--a sharp contrast to the U.S. state mentioned above.
The Achieve studies also found that, compared with those of high-performing nations, state standards omit important content. Many states begin their standards with 4th grade and provide little guidance about beginning-literacy instruction for kindergarten or 1st grade teachers who are teaching children to read. Yet without a strong foundation in reading, children will be ill-prepared to meet higher expectations at the end of elementary school.
A number of states also leave out important content in mathematics, according to the Achieve analyses. They do not specify that students should learn the fundamental concepts and properties of algebra and geometry. They tend to neglect such topics as congruence, quadratics, slope, and trigonometry, which high-achieving countries emphasize in 8th grade.
Measuring Less Complex Skills
As with the standards, Achieve found that state tests have improved substantially in recent years. Many represent a far cry from the minimum-competency tests of a previous generation.
In two of the states examined by Achieve, the tests are particularly challenging-in fact, more rigorous than the state ' standards.
In those states, the tests are challenging and measure important knowledge and skills. In English/language arts, the tests ask students to show their comprehension of reading passages and to use their understanding to make inferences based on what they have read. In math, the tests pose challenging problems that enable students to demonstrate their understanding of important concepts.
While the other state tests are not as strong, all of them show significant improvement over those of the past.
Notably, they all employ a mix of test formats. Although states continue to rely largely on multiple-choice questions, the tests include at least some open-ended ones that ask students to show a deeper level of understanding. Often, however, such open-ended questions call on students to write only brief, short-answer responses.
Writing assessments are particularly strong, Achieve found. They not only provide a good gauge of students' ability to produce prose, they also send an important signal to teachers that writing ability is important.
Achieve's alignment studies also found that state tests do, in fact, measure much of the knowledge and many of the skills delineated in the states' standards. Tests, in many cases, do an effective job of measuring basic knowledge and skills and, in most cases, attempt to measure such higher-order abilities as reasoning and problem-solving.
But the attempts to measure such higher-level cognitive tasks are not always successful. In some cases, what passes for problem-solving is little more than slightly more challenging arithmetic operations. Students are asked simply to plug numbers into ready-made formulas, rather than to determine which solutions are appropriate and if their solutions are reasonable. Rarely are students asked to collect data themselves.
In other cases, tests measure only the least complex of the skills called for in the standards. In part, that problem reflects the all-encompassing nature of state standards.
Some standards ask students to demonstrate a number of skills, such as identifying geometric properties and using them to solve problems. But the tests may ask student only to identify the properties. In such an instance, the test item may be "aligned" to the standard in a superficial sense, but it does not measure all of what a state expects students to demonstrate.
Achieve also found that the tests tend to be "unbalanced" in that they measure some standards but not others, which is understandable. A single test is unlikely to measure all of a state's standards. In some cases, the state choose to emphasize the most important concepts or skills. But in others, more demanding content is omitted from the tests.
And sometimes it was not clear that a state had acted deliberately in choosing what to test. Instead, the choice seemed haphazard. In one state, for example, almost half an 8th grade reading test assessed a single standard: "Make inferences and draw conclusions." While that objective is worthwhile, it leaves little room for other standards that are also important but are not tested, among them, word recognition and vocabulary, Achieve notes.
Similarly, in math, some state tests tend to focus on number and measurement, leaving little room for standards related to algebra and geometry. In 8th grade, states tend to measure arithmetic, which other nations expect their students to master in elementary school. In one state, for example, nearly half the test items in grade 8 measured whole-number operations, and 40 percent were devoted to measurement. (Some questions could have tapped both topics.)
In contrast, only one-fourth of the items measured two-dimensional geometry, and another one-fourth measured functions, relations, and equations--the fundamentals of algebra.
"In some cases, what we're finding is that the tests and what they value and what they measure isn't always as rigorous as you think it should be from looking at the standards," Gandal says. "The question that raises for us, and that it ought to raise for states, is what are you really encouraging teachers to do in the classroom? To aim high or to aim lower, where the tests are truly pegged?"
One of the most serious shortcomings identified by the Achieve analyses involves the level of challenge state tests pose for high school students. Simply put, the difficulty of tests tends to reach a plateau or decline between middle school and high school. In one state, for example, the 4th and 8th grade math tests are demanding and ask students to demonstrate knowledge and skill that are important and appropriate for those grades. But the 11th grade test is considerably easier and asks questions that are more appropriate for middle school students.
Three of the toughest issues on the horizon, as identified by Achieve, are where to set the bar for student performance, particularly for high school graduation; how fast states should expect performance to improve; and how to identify multiple measures of student learning.
In general, Gandal points out, American standards have not yet addressed the fundamental question: How high is high enough? To answer that question, he says, states need to figure out what students need to know and be able to do in order to succeed in the workplace or in college after they graduate from high school. States must engage business and higher education institutions in getting answers to that question in ways that those groups have not been engaged so far, he argues.
In addition, states have taken divergent paths in setting standards. Some, like Texas, set them relatively low and steadily ratchet them up over time. Others, like Massachusetts and New York, set them high at the outset. While performance has improved, it is still far from the targets Massachusetts and New York set. States need to determine which approach works the best.
States also need to consider the use of multiple measures of student performance. Most experts agree that a student’s educational fate should not rest on a single test taken at one point in time. But, for the most part, the testing and education communities have not yet agreed on what additional measures states might use to see if students are meeting state standards, let alone how to incorporate such measures into their accountability systems.
State leaders need the advice of measurement experts and others in the education community on such issues, Gandal says.
“Everyone seems to agree there ought to be multiple measures,” he says, “but there are precious few models of what this would look like while maintaining a common, high standard.”
Gandal contends that states are willing to tackle such tough issues. The fact that some states have elected to subject their standards and tests to independent scrutiny proves that states want those initiatives to succeed, he says.
But states cannot address such challenges alone. “It’s incumbent on all of us who believe in high standards for all students to not simply criticize states for what doesn’t work,” Gandal says, “but to help come up with concrete solutions.”
Vol. 20, Issue 17, Page 33-36, 38, 40Published in Print: January 11, 2001, as Gaining Ground