As states add reading and math exams in previously untested grades to comply with the No Child Left Behind Act, they will have to determine the level of performance considered “proficient.”
|UPDATE ON NCLB
An Education Week Special Pullout Section: A Progress Report on the No Child Left Behind Act
In particular, states must figure out how to make their achievement standards on the new tests mesh with those in the grades already being tested, so that the progression of growth expectations across grade levels is smooth. Otherwise, 4th graders who are rated proficient in mathematics one year may suddenly score below that level the next simply because the standard, or cut point, has shifted.
“I think it’s causing some difficulties,” Robert L. Linn, a professor of education emeritus at the University of Colorado at Boulder, said of state efforts to set performance standards.
A survey conducted this fall by the Editorial Projects in Education Research Center found that at least 11 states set new achievement levels in reading/language arts in the 2004-05 school year. About nine states did so in mathematics.
Those numbers are expected to grow substantially this school year, as nearly half the states administer reading and math tests in more grades.
The federal law requires states to give tests in reading and math in grades 3-8 and at least once in high school, starting this school year.
The combination of new performance standards and tests will make it even harder to determine if schools are really improving, based on whether they have made adequate yearly progress under the federal law.
As states add performance standards or revise the scores students need to qualify as “proficient,” it may be unclear if the bar has been raised, lowered, or kept largely the same.
Figuring out the height of the bar is not easy, responses to the epe Research Center survey show.
States are moving to meet the testing requirements of the No Child Left Behind Act. Almost all states will test students in reading and math in grades 3-8 and once in grades 10-12 in 2005-06. Slightly fewer than half of all states currently have standards-based science tests in each of three grade spans: 3-5, 6-9, and 10-12. The nearly 4-year-old federal law requires states to have such science tests in place by 2007-08.
*Click image to enlarge
Note: The District of Columbia is included in this analysis. Total state count = 51.
SOURCE: Editorial Projects in Education Research Center, 2005
In Arizona, for example, officials held a series of meetings last May to set new achievement levels in reading, math, and writing for their state’s tests in grades 3-8 and high school. State officials report that high school students now must answer a lower percentage of items correctly to meet the proficiency standard. But the tests also now contain more items, so students must show more knowledge to be rated proficient.
Arkansas also set new performance levels in 2005 for its reading and math exams. “On balance, the cut scores are generally comparable,” a state education department official reported, “although at one particular grade or another, the cut score may be somewhat higher or lower.
“It is difficult to make an exact comparison,” the Arkansas official continued, in response to the epe Research Center survey, “since the content standards being measured have been revised and since the design of the literacy portion of the examination has changed.” What states want to avoid, said Scott Marion, a vice president of the Center for Assessment, a Dover, N.H.-based group that works with states to improve their testing-and-accountability systems, are erratic swings in performance from grade to grade because of where they’ve set the bar.
“If you have an assessment in grades 4, 8, and 11 and now you’re going to fill in the rest of the grades, do you go back and completely revisit all your performance standards, which some folks are doing,” Mr. Marion said, “or do you try and set new standards for the new tests and live with your old ones where they were?”
Some states, such as Arizona, have developed a single “vertical” scale that summarizes student achievement across grade levels, at least in grades 3-8.
Such scales, according to Robert W. Lissitz, a professor of education at the University of Maryland College Park, assume that tests at different grade levels focus on similar math or reading concepts even though they measure different content. Students are expected to improve on the scale each year as their math or reading skills increase.
But Mr. Lissitz and other assessment experts say that vertical scales are hard to construct and are based on questionable assumptions about how common the content really is across grades.
He and others advocate what they call “vertically articulated” or “vertically moderated” standards. Such methods rely on a combination of human judgment and statistical analyses. They consider both the content standards and test difficulty in each grade, along with data on how students actually perform, to set cutoff scores.
The assumption, said Mr. Marion of the Center for Assessment, is that if 50 percent of a state’s 3rd graders are proficient in mathematics, “and you don’t think 4th grade math is all that different, your best guess is 50 percent of the kids should be proficient in grade 4, too.”
“That’s not deterministic,” he said. “It allows you to set a starting point.”
‘A Purposeful Act’
In Michigan, for example, curriculum standards were revised in 2004, based on the nclb testing requirements, so that grade-level expectations are now more rigorous and specific. The state not only added new tests in grades 3, 5, 6, 7, and 8, but also shifted from a spring to a fall testing date.
The state plans to set performance standards and cutoff scores for the new tests in late December and early January. As one step in that process, said Ed Roeber, the state’s testing director, committees will review books in which test items are arranged in order of difficulty and determine where to set the proficiency bar.
Those books also will show where that bar would be placed to maintain a level of proficiency consistent with that in adjacent grades. What the committees decide from there is really unconstrained, Mr. Roeber said.
“Even at the grades where we’ve had tests, the committees could set standards higher or lower,” he said. “We don’t want them to do it by accident; we want it to be a purposeful act.”
For now, said Mr. Lissitz, “nobody has a real solid answer” on the best method for making such judgments.
“It’s a hard thing, because the models that we have are being developed as we speak,” he said. “Right now, we have answers, but they’re not as satisfactory as they will be in a couple of years.”