A new federal law does not explicitly require that results from “the nation’s report card” be used as evidence to confirm progress on state tests, but its mandate that all 50 states now take part in the National Assessment of Educational Progress makes such comparisons more likely. And while naep can be used in that way, a new report points out, the process is not for the fainthearted.
The report, “Using the National Assessment of Educational Progress to Confirm State Test Results,” was prepared by an ad hoc committee of the governing board that oversees NAEP, a federally financed assessment, and was presented at the board’s quarterly meeting here Feb. 28 to March 2.
The reauthorization of the Elementary and Secondary Education Act, known as the “No Child Left Behind” Act of 2001, requires that a sample of 4th and 8th graders in every state take NAEP reading and mathematics tests every other year. Although the law is silent on how those results will be used, NAEP has been used informally for a long time to judge whether state standards and tests are rigorous enough.
What’s more, a fact sheet on the new ESEA, posted on the U.S. Department of Education’s Web site, says that state NAEP results will be used to help the department “verify” state test results for the Title I program. Title I, the main federal program for disadvantaged students, is the centerpiece of the ESEA.
To determine the feasibility of comparing state results on NAEP with trends on state exams, a working group, made up largely of testing experts, examined existing data from eight states and prepared case studies for three of them. The states were not named because state officials did not have the opportunity to review the analyses.
The working group also made a breakthrough in how to display gaps and gains on NAEP that reflect student progress—or lack of progress—across the entire spectrum of test scores.
The ad hoc panel recommends that “informed judgment” and a “reasonable person” standard be used in comparing results from NAEP and state tests, rather than a strict “validation” of test scores. The group reached that conclusion because a number of factors potentially could limit the degree of convergence between NAEP and state tests.
For example, the tests may cover different content, have a different mix of formats, define subgroups of students in different ways, and employ different sampling and reporting techniques. Students’ motivation to do well on the tests may also differ.
Based on its case studies, the committee found that, in general, NAEP results were moving in the same direction as scores on the state assessments. But the studies also underlined just how complex such analyses can be.
In one state, for instance, increases in the percent of students reaching state standards were matched, often to a striking degree, by similar increases in the percentage of students reaching the NAEP “basic” level, both for the state as a whole and for the major ethnic and racial groups. Yet few data points were available to evaluate the consistency of NAEP and state test results when it came to closing gaps between subgroups of students.
In another state, NAEP results were inconsistent with results on the state tests, but no regular pattern existed. That suggests the tests may have very different characteristics, and any comparisons would be “extremely challenging,” the report says.
Meanwhile, in the third state, there were large changes in the percentage of black students scoring above the “basic” level on the NAEP math exam between 1996 and 2000, and those changes moved in the same direction as state test results. In reading, African-American 4th graders made advances on NAEP between 1994 and 1998, but mostly at the lower score levels, which was not apparent when looking at the percent of students at or above basic.
“I don’t think we have any doubts that NAEP will be useful for interpreting state arguments,” said Michael T. Nettles, a member of the National Assessment Governing Board and the chairman of the ad hoc committee. But, he cautioned, “it’s not a simple process.”
Given such complexities, the committee recommends that “states should be given the benefit of the doubt about whether their results are confirmed” by NAEP.
“Any amount of growth on the national assessment,” the report argues, “should be sufficient to ‘confirm’ growth on state tests.” In addition, the report urges that limitations in using NAEP to confirm state testing trends should be “acknowledged explicitly.”
The report also advocates that to maintain NAEP’s value as a “consistent, stable measure,” changes in its frameworks “should be made infrequently” and “only when the rationale for doing so is compelling.” To the extent possible, the report says, states and NAEP should use similar definitions to categorize students by subgroups, such as race and ethnicity.
And it warns that where the exclusion rates for testing students with disabilities or limited English proficiency are noticeably different between the state tests and NAEP, any results should be interpreted with care.
Finally, it argues that NAEP should conduct research to determine whether “over-sampling” of some subgroups is necessary to increase the accuracy of achievement estimates. NAEP currently does not report results for special education or limited-English-proficient students, for example.
Some board members also worried that using NAEP to confirm state test results would throw the national assessment open to harsher scrutiny than has been true in the past.
“NAEP has never gored anyone’s ox,” said board member Edward Donley, a retired businessman from Pennsylvania. “But when the results start showing a difference between NAEP and state A, I fear that NAEP will come under attack.”
Much of the board’s time here in New Orleans was spent debating changes that should be made in the assessment, both to conform with new ESEA requirements and to strengthen the program.
The governing board, known as NAGB, approved a new schedule for NAEP that includes assessments in reading and math every other year, beginning in 2003. To accommodate that change, testing in some other subjects will be delayed. But the schedule reflects the board’s commitment to continue measuring performance in the full range of subjects NAEP normally tests.
The law also requires the Education Department to continue the NAEP trend assessment at ages 9, 13, and 17, in at least reading and math. There had been some concern that the governing board would recommend scrapping the NAEP trend assessment. (“Governing Board Considers Scrapping Long-Term NAEP,” Nov. 28, 2001.)
The new ESEA gives the governing board “final authority” on the appropriateness of all background questions on NAEP, in addition to its existing authority to review test items. The law specifies that the board must ensure that all NAEP questions are “secular, neutral, and nonideological.”
“I asked somebody who was involved in drafting the law, ‘What do those three words mean?’ and the answer was, ‘We don’t know,’ ” said Diane Ravitch, a board member and a research professor at New York University. “It’s not as if NAEP questions have been loaded with ideological or political imagery until now,” she added, “but those are the kinds of words that can cause a lot of problems.”
The ESEA also requires the governing board to establish a process for the active participation of teachers, parents, and others in the review of the assessment.
But the law drops a requirement that the board reach a “national consensus” regarding the frameworks that it uses to devise tests in particular subjects. The change prompted the board to pull back a request for proposals last month to craft a new reading framework, because the proposal required a “national consensus” process.
The new law also contains provisions that guarantee the public access to all NAEP questions, under secure conditions, permit parents and the public to submit complaints about NAEP to the governing board for possible action, and require that parents be notified that their children may opt out of the test. (“New Law Lets Students Opt Out of NAEP,” Feb. 6, 2002.)
In some cases, the new requirements make explicit what had been implicit. For example, the governing board has always handled complaints on an informal basis, but now must set up a formal procedure and submit a summary of all complaints and responses to Congress.
Upon request, the board also has granted individuals access to NAEP questions under secure conditions. But Ms. Ravitch, at least, said she was worried that the more expansive language in the new law could threaten the security of the tests.
A Test in Spanish?
One unexpected curveball is that the ESEA considers Puerto Rico a state, both for the purposes of Title I and NAEP. Because the main language of instruction in Puerto Rico is Spanish, that may require the development of Spanish versions of NAEP math and reading tests.
Potentially, once such tests are produced, states with large Spanish-speaking populations also may want to use them for some of their students. Board members said they may need to ask Congress for additional guidance on what was expected.
Three decades ago, noted Gary W. Phillips, the acting commissioner of the National Center for Education Statistics, which is responsible for the national assessment, NAEP “was basically a research activity,” and few people paid attention to its arcane sampling and reporting procedures.
“Now,” he said, “you have to have not just the correct procedures, but procedures that are simple and public and understandable.”
Most observers said the change was positive and part of the maturation process, as occurs with any other set of government statistics.
“NAEP has been very useful and helpful for 30 years,” said Archie LaPointe, the executive director of the school and college services division at the Princeton, N.J.-based Educational Testing Service. “The new NAEP will serve a different purpose,” said Mr. LaPointe, who once oversaw the NAEP contract for the ETS, “maybe even more successfully and more effectively.”
But others fear that the higher stakes now attached to NAEP—and the desire by states to look good on the assessment—could politicize the national assessment and threaten its integrity.
“That’s something of a worry to me,” said Robert L. Linn, a distinguished professor of education at the University of Colorado at Boulder.
The key to ensuring the assessment’s success, argued board members, may be in educating the public about the nation’s report card.
“NAEP’s importance is vastly greater. NAEP’s ability to play a significant role is vastly greater,” said Mr. Donley. “But only if we develop ways to communicate to millions of people in this country what this measurement system is all about.
“Unless we carry that vision out to every crossroads in the country,” he said, “our work will be irrelevant.”
A version of this article appeared in the March 13, 2002 edition of Education Week as Want to Confirm State Test Scores? It’s Complex, But NAEP Can Do It