Accountability Systems 'Mediocre,' Study Finds
Not a single state in America has its whole system of standards, tests, and accountability policies "right," argues a report from two Washington-based groups.
"Grading the Systems: The Guide to State Standards, Tests, and Accountability Policies" judges states on the quality of their reading and math standards, the content of their tests, the match between their standards and tests, test rigor, their technical trustworthiness, and state accountability policies.
The project, underwritten by the Smith Richardson Foundation, evaluated 30 state systems in each category on a scale from 1 to 5, with 5 denoting "outstanding" performance.
Although some states—such as Massachusetts, Pennsylvania, and Virginia—got high marks in three out of six categories, the multistate average is only "mediocre," according to the report published by the Thomas B. Fordham Foundation and AccountabilityWorks, a consulting group.
But what's most notable about the review is what's missing.
|Read the accompanying story, "Ratings by State."||
States were included in the study based on whether copies of the state-administered tests in reading and mathematics (or an equivalent form of such tests) could be obtained for analysis. A few states, such as Massachusetts, post previously administered tests on their Web sites. Colorado, Illinois, Michigan, New York, and Pennsylvania agreed to make secure test forms available for review. In other cases, the reviewers used an alternate form of an off-the-shelf test that was available from the publisher but not identical to the one given in the state.
But in the vast majority of cases, the authors complained, states would not release basic information that should be available.
"While we did feel that it was a significant accomplishment that we could look at 30 state testing systems, it was unfortunate that 20 states did not feel the need to share their tests under secure conditions," said Theodor Rebarber, AccountabilityWorks' president.
Given that state accountability systems drive so much of what happens at the district, school, and classroom level, he said, "there's a need for some external, independent review of what's under the hood, so to speak."
Bill Reinhard, a spokesman for the Maryland education department, said officials there could not recall the request. But, "we rarely let any kind of researchers look at our tests, even under secure conditions, because we don't own a lot of the test items.
"There's a lot of hoops to go through," he added, noting that the process is difficult and time-consuming, "and if we started to do that, there's some concern that we'd have to start doing it for a lot of different organizations."
No Guarantee of Quality
To review state systems, the project assembled a team of individuals with expertise in the relevant academic-content areas and grades. With their advice, AccountabilityWorks established a set of "reference standards" that cover what the organization considers to be essential math and reading skills at each grade level. The reviewers then judged state systems against those standards.
The project also reviewed the "trustworthiness," or reliability, of state tests based on criteria devised with help from Susan E. Phillips, a psychometrician and lawyer in private practice.
The project examined both criterion-referenced tests crafted by states to specifically match their standards and commercially produced norm-referenced tests that are used as part of state accountability systems. The latter measure how students stack up against a nationally representative sample of their peers.
But the study didn't find that one type of test was always better.
In math, for example, the reviewers discovered that the content of norm-referenced tests was significantly better than that of criterion-referenced tests in the elementary and middle grades, and significantly worse at the high school level. That's because, with a few exceptions, the norm-referenced tests for high schools incorporated only limited amounts of high school math, such as algebra or geometry.
The authors also found that while states' custom- designed tests were better aligned with state content standards than were off- the-shelf exams, "the difference is not nearly as large as one might expect."
But Mr. Rebarber said the weakest dimension of state systems was the rigor of state tests, or where the states set the cutoff scores needed to perform at the proficient level. Even states with challenging standards and tests often had cutoff scores that more closely resembled "minimum competency expectations," he said. Massachusetts alone earned a strong overall rating on test rigor.
The researchers concluded that 18 states would have solid accountability systems if the federal No Child Left Behind Act was "fully and properly implemented."
Vol. 23, Issue 23, Page 18