A 'Proficient' Score Depends on Geography
When Colorado officials reported that only 14 percent of 10th graders scored at the "proficient" level or higher on the state math test last year, the news seemed grim—until a group of local superintendents asked researchers to figure out what "proficient" means in Colorado.
The analysis, released this month, concludes that the content of the Colorado exam is "substantially more difficult" than the SAT and "more difficult" than the ACT, the two leading college-admissions tests. Teenagers who performed at the proficient level on the Colorado exam, in fact, scored at or above the 90th percentile on PLAN, a test the ACT designed to predict students' performance on its admissions exam.
Colorado's experience illustrates just how slippery a term "proficient" can be. What exactly does proficient mean? The answer, it appears, depends on where you live. Yet how states define the word is at the heart of the reauthorized Elementary and Secondary Education Act.
States that have set very high standards may find it difficult to bring all students up to the proficient level by 2014, as the new federal law requires. Consequently, some educators fear, the threat of losing federal funds could encourage some states to race for the bottom.
"If you look at the different states, the tests vary in terms of how stringent the content standards are that they're supposedly aligned with. They vary in how rigorous the tests are. And they vary in how high the [performance] standards have been set," said Robert L. Linn, a professor of education at the University of Colorado at Boulder. He is also a co-director of the National Center for Research on Evaluation, Standards, and Student Testing, a federally financed research center at the University of California, Los Angeles.
"The worry I have," Mr. Linn said, "is that states that have been doing a good job—that is, they've established ambitious targets— are going to be tempted to lower their standards and to water down their tests. And that would be counter to what I think most of the people who were behind the law really wanted to happen."
A new analysis by Education Week shows just how varied state standards can be. Education Week researchers compared the percent of students who scored at the proficient or similar levels on state mathematics tests in 2000 with the proportion who scored at the proficient level on the National Assessment of Educational Progress, the federal program that tests samples of students in key subjects. The analysis included the 25 states that participated in the state- level NAEP that year, administered a state math test in grades 4 or 8, and reported results by achievement levels.
In North Carolina, for instance, 84 percent of 4th graders scored at the proficient level on the state test, while only 28 percent scored at that level on NAEP. In Wyoming, the proportion of 4th graders scoring at the proficient level on both the state and national tests was closely matched, at 27 percent and 25 percent, respectively.
Only Idaho, Louisiana, Missouri, North Dakota, and Rhode Island had a smaller share of students scoring at the proficient level on their own tests than on NAEP at the 4th or 8th grade.
That states may have widely different definitions of what counts as proficient has been pointed out since at least 1996. That's when Mark S. Musick, the president of the Atlanta-based Southern Regional Education Board, wrote a report in which he noted that "state standards for student achievement are so dramatically different that they simply don't make sense."
Mr. Musick reached his conclusions after comparing the percent of students who scored at the proficient level on state reading and math tests in 1994-95 with the proportion who scored at the proficient level on NAEP. He discovered, for example, that only 13 percent of Delaware's 8th graders met the state's 8th grade math standard, compared with 83 percent of 8th graders in Georgia. Yet on the state NAEP, 8th graders in Delaware outscored their Georgia counterparts. What's going on here, he asked.
"I have never argued that state standards are wrong because they do not agree with the NAEP achievement levels," said Mr. Musick, who currently chairs the governing board that oversees policy for NAEP. "I have argued that state leaders should want to know why standards-based results are so different. When they know why, then they can decide if they believe their standards are about right or whether they need to be changed."
Similarly, a report released in January 2001 by the U.S. Department of Education on implementation of the 1994 ESEA requirements noted that "the percentage of students scoring at the 'proficient' level on state tests varies widely across states." And it cautioned that those differences "do not necessarily reflect actual differences in student achievement."
"Variability in the rigor of standards is a concern," concluded the report, "given the lack of evidence that states have benchmarked standards against common criteria."
How Good Is Good Enough?
The idea that states should set "performance standards" that define what students should know and be able to do at specific grade levels and how well they should be able to do it is relatively recent.
In 1988, Congress created the National Assessment Governing Board to set policy for NAEP. And it directed the board to make a fundamental shift in how NAEP scores were reported. Instead of simply measuring how students did on the test, lawmakers wanted to know how well students did against some defined standard for what students should know at a particular grade level, or "how good is good enough."
As a result, NAEP scores have been reported since the early 1990s by the percentage of students who perform at one of three achievement levels: "basic," "proficient," and "advanced," with proficient representing "solid academic performance" over "challenging subject matter."
When Congress renewed the ESEA in 1994, it similarly required states to set at least three performance levels to describe how well children were mastering the material in state academic standards. And it required states to move all children toward proficient over time.
By January of last year, however, only 28 states had actually set performance standards approved by the federal Education Department.
So when Congress again reauthorized the ESEA, in the "No Child Left Behind" Act signed by President Bush last month, it required states to set at least three performance levels—basic, proficient, and advanced— and to adhere to a strict, 12-year timetable for bringing all students up to the proficient level.
But because states are all over the map in setting the bar, meeting that deadline will be a far greater challenge in some places than in others. And it will mean something quite different from state to state if— and when—all students are proficient.
"Proficiency standards might be set anywhere from the top to the bottom of the test-score distribution," said Lorrie A. Shepard, the dean of the education school at the University of Colorado at Boulder and one of the principal authors of the Colorado study. "Unfortunately, most policymakers are not aware of how high some standards have been set and are inclined to treat all standards as if they were the same."
Some states took the cry for "world class" standards literally. Like Colorado, they have set rigorous standards that describe what they would like their students to know and be able to do over time, rather than what most can do at present.
William J. Moloney, the Colorado commissioner of education, said the state's math standards were designed to reflect "not what children do know, but what they should know" and what is currently taught in other industrialized nations. As a result, he said, it was not surprising that the tests include material that most 10th graders have neither been taught nor mastered, because the tests were designed to drive the curriculum in new directions.
Rhode Island uses the New Standards Reference Exams in English and mathematics, whose content also was benchmarked against international standards. Dennis W. Cheek, the director of research for the state education department, said about 65 percent of the students in the state's highest- performing district are proficient across all grades, subjects, and years tested.
"That's an aggregate," he added. "So the actual figure is probably closer to 50 percent in terms of kids who are proficient in every test they've taken. So we have a long, long way to go."
When Mr. Musick did his analysis in 1996, Louisiana administered a relatively easy minimum-competency test with very low scores required to pass the exam. While 88 percent of 3rd graders passed the state reading test in 1994-95, for example, only 15 percent of 4th graders met the proficient standard on the NAEP reading assessment that same year.
Lawmakers took note. When the state decided to revise its tests in the mid-1990s, state officials demanded that the standard be "of a rigor comparable to NAEP," said Scott M. Norton, the director of standards and assessments for the Louisiana education department.
Today, a comparable proportion of students are proficient on both the state and national tests. In mathematics in 2000, for example, 8 percent of 8th graders were proficient on the state test, compared with 12 percent on NAEP.
Now, Mr. Norton has a different concern. Is the federal government really going to require all Louisiana students to hit that high proficiency level by 2014?
"Our policymakers wanted to make sure that we could have a standard we could hang our hat on—some even called it the 'gold standard,'" said Mr. Norton. "However, we don't want to be punished for that." Rather than getting every single student to the proficient level, Mr. Norton is hoping for flexibility in how the federal law is interpreted.
In fact, Louisiana's goal now is to get all students to the basic level on its state tests by 2009. And students currently only have to perform at the "approaching basic" level to be promoted from grade to grade. Just getting most students to the basic level, Mr. Norton noted, "for us, was a pretty high standard."
Texas officials, meanwhile, initially set a very low passing bar on the Texas Assessment of Academic Skills and have slowly ratcheted up their expectations over time. Even so, in 2000, 87 percent of 4th graders and 90 percent of 8th graders "met minimum expectations," the threshold required for passing the test.
Now, the state is phasing in a new, more rigorous series of exams and must again grapple with how high to set the bar, which other states will also face as they try to comply with the new ESEA. Under the law, states are free to revise their content and performance standards at any point.
But the law also requires a representative sample of 4th and 8th graders in every state to participate in NAEP reading and math tests every other year. While it does not say how those results should be used, they could exert public pressure on states to keep their standards high.
"No one is going to want their own state results to be much different from the NAEP, because it will raise a lot of questions," said Jeffrey M. Nellhaus, the associate commissioner for student assessment in Massachusetts.
But, he added, "to have a little flexibility here is not a bad thing. It's not like proficient has come down like the Ten Commandments, and we know exactly what it is."
Abigail Thernstrom, a member of the Massachusetts board of education, predicted that it would be impossible for all students to reach the state's proficiency level, which is roughly comparable to NAEP, even in a dozen years.
"It's a ludicrous goal," she said. "We will have to define proficiency way down—I mean way, way down." At present, many students are struggling to perform at the "needs improvement" level required to earn a high school diploma, she said at a meeting in Washington last week sponsored by the Thomas B. Fordham Foundation.
In a paper the University of Colorado's Mr. Linn wrote in March 2000, he warned that the strategy of "shaming states" into getting their performance standards in line with NAEP was not necessarily wise policy.
For one thing, he argued, the curriculum frameworks on which NAEP is based were never designed to reflect what individual states have decided they want their students to know and be able to do. Consequently, the tests may measure different content. States also may have different purposes when they set passing scores on their own tests.
Louis M. Fabrizio, the director of the division of accountability services for North Carolina, notes that what NAEP classifies as "basic" is probably closer to what his state identifies as "proficient" performance. But there's a good reason. While NAEP defines proficient as "solid academic performance" over "challenging subject matter," North Carolina's proficiency standard refers to "mastery of material sufficient to be able to go on to the next grade."
Moreover, the NAEP governing board and the state use different technical procedures to determine where to set the bar. The governing board uses a "modified Angoff method," which, in part, asks panels of experts to classify test items based on what they think a proficient 8th grader should be able to do.
In North Carolina, teachers are asked to think of the actual students in their classes and, of those students who are doing proficient-level work, identify the math problems they could solve.
The stakes are also different. Students taking the NAEP exam do not receive individual scores. In North Carolina, by contrast, test results are used to help make promotion decisions in certain grades and to provide financial rewards and penalties for schools.
"If by waving a wand, we superimposed NAEP-like standards on our test, then we would, in essence, be saying that more than 70 percent of our kids may not be proficient enough to go on to the next grade," Mr. Fabrizio said, and that's simply not feasible.
A series of studies in the early 1990s severely criticized the NAEP achievement levels, arguing, in part, that the standards were set "unreasonably high," and that they were not grounded in any external, empirical evidence of their validity.
The problem, said Archie Lapointe, who once oversaw the NAEP contract for the Educational Testing Service, is that getting a committee of experts to decide what constitutes proficient knowledge and skills is relatively straightforward when it comes to occupations, "because everyone can agree what a skilled carpenter or a skilled mechanic should know and be able to do."
But it's much harder to transfer those same procedures to education, he said, because "there is no definition of what a competent 18-year-old is in American society."
"Ultimately, there's nothing scientific about this," he argued. "There are judgments that are made, and they're made by well-meaning people."
'Reasonable, But Realistic'
"The variability among the states is a phenomenon that has plagued us for decades," said Mr. Moloney, Colorado's commissioner of education. But "I think the decisive shift toward a culture of achievement and high expectations is the very best thing" about the reauthorized ESEA, he said.
In the long run, predicted Matthew Gandal, the vice president of Achieve, states will have to "decide what's a reasonable, but realistic, target."
"There's going to be as much politics involved as education," suggested Mr. Gandal, whose Cambridge, Mass.-based organization, formed by governors and business leaders, has reviewed the rigor of academic standards and tests in dozens of states, "because who can afford to set the bar at an unreachable level?"
He suggested it would be up to the business community and others to discourage states from taking the easiest political path and setting the bar low.
"The message seems to be, if you're trying to beat the system, the best thing to do would be to set a low standard," argued Mark D. Reckase, a professor of measurement and quantitative methods at Michigan State University in East Lansing, "so that you'd have a reasonable opportunity to get all of your students above that standard in the 12 years that you've got."
One solution is to make state tests and the process for determining what level of performance is "good enough" as open as possible, some experts say, so that the public knows what's being tested and what solid academic performance looks like. Then, people can judge for themselves.
"My bias is and continues to be that the best way for states and America to get standards right is to continually show the public what it is that students are being tested on, and how students do," said the SREB's Mr. Musick. "Over a period of years, with continual release of tests and results, state leaders and the nation collectively will come to agree on what standards should be."
States also might want to base their benchmarks on some empirical evidence. Achieve, for example, is trying to help states set graduation standards based on what young people actually need to know and do to succeed in college and in the workplace.
Ms. Shepard of the University of Colorado has suggested that states check on the reasonableness of their performance standards by looking at the actual distribution of how students score on their tests, as well as how the same students perform in other venues, ranging from other tests to actual examples of classroom work.
For now, many observers say, if the public is confused about what "proficient" means, they have a right to be.
"The larger community, which in many states like ours votes on education issues and tax issues, may have little or no information about the difficulty of these tests," said Monte C. Moses, the superintendent of the 43,000-student Cherry Creek school district in Colorado. He is a co-chairman of the Denver Area School Superintendents Council, which commissioned the analysis of the 10th grade Colorado exam.
"We need these high standards, and we need the challenging assessments," Mr. Moses added, "but we've got to do it in a common sense manner where we're really being fair to students and teachers and schools. I think the thing we would wish for is a simple helping hand, and an acknowledgment that these issues do exist from the various leaders in our state capitals."
Coverage of research is underwritten in part by a grant from the Spencer Foundation.
Vol. 21, Issue 23, Pages 1,14-15