Former New York Times columnist Richard Rothstein has emerged as one of the nation’s sharpest critics of the current test-centered approach to education reform. Six weeks ago I posted a review of his recent book, Grading Education, Getting Accountability Right.
I thought it would be great to hear his comments on the debates raging over how to fix NCLB, and proposals such as national standards. Here is part one of a four-part interview:
1. In Chapter 4 you describe how a student who scores as proficient in 8th grade math in Montana could go a few miles across state lines to Wyoming and be far below proficient. Because of such embarrassments, many education policymakers now advocate requiring states to adhere to higher, common standards. Would such a reform correct the problem?
The widespread call for higher common (or national) standards has little relevance to the problem it pretends to address – that states manipulate passing points on their tests under the pressure of NCLB’s accountability requirements.
Although proficiency data from various states are not comparable, we already have adequate means of comparing student performance in Montana and Wyoming and in every other state, at least in math and reading in the elementary grades. We can do so by using results from the National Assessment of Educational Progress (NAEP), a federal test given to a representative sample of students in every state.
Advocates of national standards typically confuse three things:* standards
* test coverage (or alignment), and
* cut points (proficiency, or passing scores).
Standards are descriptions of the knowledge and skill that teachers should cover in each grade. Tests reflect whether students have gained that knowledge and skill. Cut points are the number of questions on such tests that students must answer correctly to pass, or to meet accountability targets.
In practice, standards, tests, and cut points are often designed with little regard for each other.
Alignment of standards and tests
State test questions should cover a representative body of the knowledge and skill that state standards say should have been learned. Because a curriculum covers a large span of knowledge and skill, any test of one hour or so must select only a small portion of a year’s standards to assess.
Typically, states select only the simplest standards for tests used for accountability purposes. State officials claim that their tests are “aligned” with state standards because each question on a test covers something found in the standards. But when these questions cover only the simplest skills in the standards, students who do well on the tests may still not have learned a representative selection of what the standards say they should have been taught.
Making matters worse, many states have standards so comprehensive that they could not possibly be delivered in a year-long curriculum. These standards are “high,” but have little relationship to reality. Because a national standards-setting process would likely be controlled by elected officials and policy advocates, not classroom educators, efforts to establish high common standards will likely have even more fanciful results.
Many states have high standards and easy tests. Establishing high common standards will do nothing to solve this problem.
Cut points
Once a test has been adopted, NCLB requires states to establish a cut point, or passing score. With tests assessing the same underlying knowledge and skills, a state can have a high passing score, showing a small proportion of students “proficient,” or a low passing score, showing a high proportion of students “proficient.” A state can have higher standards and a low passing score, or lower standards and a high passing score.
Now let’s return to your Montana-Wyoming question. Even if we had high common standards, Montana could have a test that sampled an easier portion of the common curriculum, and Wyoming could have a test that sampled a more difficult portion of the common curriculum. Or, with a common curriculum and comparable alignment, Montana could establish a low passing score on its test and Wyoming could establish a high one. A large share of Montana’s students and a small share of Wyoming’s would then be deemed proficient.
If we want students in Montana and Wyoming with the same achievement to have the same chance of passing accountability tests, we need a national test, with questions drawn from the full grade-level curriculum, with a single passing point - not national standards alone. Establishing a national test, however, is widely regarded as politically impossible. President Clinton proposed voluntary national tests and even this was shot down. There is today a new attempt, led by the Council of Chief State School Officers and the National Governors’ Association, to create national standards. The success of this effort is uncertain; even less certain is whether, if states voluntarily adopted common standards, voluntary national tests would follow.
Already, the test-based accountability coalition is splintering on this issue. Former Secretary of Education Margaret Spellings, for example, recently denounced the call for higher, common standards because it will interfere with NCLB’s goal of closing the achievement gap (when defined as achieving a low level of proficiency) by 2014. She’s right, if national standards lead to requiring higher cut scores on a more difficult test.
Can state tests be “equated”?
Alternatively, we could require Montana and Wyoming to establish passing points on their respective, very different tests that reflect a similar achievement of knowledge and skill. If one state, for example, had a relatively easy test, NCLB could require a larger number of correct answers for passing; if another state had a relatively harder test, NCLB could require a smaller number of correct answers for passing. Precision in this exercise would not be possible, but it is technically feasible to determine what roughly equivalent passing points should be.
But it is hard to imagine how this could be accomplished in practice. It would take considerable time and expertise to create such definitions – a sample of students would have to take both state tests, or a new common test, and their scores on each test compared – and when a state changed its test, the effort would have to be repeated. To equate the tests of many states would be more complex, and the processes would have to be repeated frequently because states must change their tests frequently to make the precise questions unpredictable and minimize “teaching to the test.” (Many states now change their test questions in minor ways, but don’t change the portion of the curriculum the test covers. Such changes do only a little to avoid teaching-to-the-test corruption, but would still require new equating studies to determine if passing points were similar.)
To the extent that tests in different states included questions that represented different aspects of a common curriculum, efforts to equate such tests would be impossible.
The Northwest Education Association has a common test administered in some (but not most) states, and the NWEA has used its common test to compare the passing rates on states’ own tests. But because states change their tests, and passing points, frequently, an NWEA report can have only a very short shelf-life.
The bubble
States’ educational performances can differ, even if similar percentages of students were to pass identical tests. When NCLB holds schools accountable for getting students past an arbitrary proficiency point, some states and school districts can (and do) tell teachers to focus inordinate attention on students who perform at a level just below the cut point, to push those students, typically referred to as “on the bubble,” over the passing line. Teachers who pay extra attention to bubble students necessarily spend less time instructing children who are far below or already above the passing level. States where this takes place can have higher passing rates with lower overall performance.
We already know how students compare across the nation.
We already have almost all the information we need to determine how student performance in math and reading in one state compares to that in another. The National Assessment of Educational Progress (NAEP) gives a common test to a sample of students in every state, in 4th and 8th grade, every other year. Several different test booklets are used; this makes it possible to sample a broader swath of the curriculum than would be possible if all students were given the same test. Because teachers do not know far in advance whether their students will be among those sampled, “teaching to the test” is less present for NAEP than for state tests. The underlying framework of NAEP (i.e., the implicit curriculum that NAEP assesses) is, in effect, the common standards that many people say we now need.
NAEP reports not only the average scores of students in each state, but also the distributions – for example, how students in the bottom quartile of performers in each state compare. NAEP reports the average scores of race and ethnic groups within each state, the average scores of boys and girls, and the average scores of children from low-income families. With all this information, and without explicit national standards or tests, we can easily compare the performance of students from the various states, and make inferences about the quality of each state’s educational and youth development systems.
Thus, from NAEP, we already know that the performance of students in Montana and Wyoming is almost identical. On state tests, 64% of 8th graders in Montana were deemed (in 2003) to be “proficient” in mathematics for NCLB’s accountability purposes, compared to only 11% in Wyoming. But NAEP also established its own common passing score, and reported that 35% of 8th graders in Montana were NAEP-proficient in math in 2003, compared to 32% in Wyoming.
Decisions about how many NAEP questions a proficient 8th grader should answer correctly are just as arbitrary as decisions about how many must be answered correctly on the Wyoming or Montana tests. There is no basis for saying that the NAEP proficiency definition is better or worse than the Montana or Wyoming definitions. But it doesn’t matter. NAEP’s arbitrary definition (and actual scale scores) gives us all the information we need to determine how student achievement in Montana compares to student achievement in Wyoming; national standards can add nothing to what we already know in this respect.
Why can’t NAEP be the national test?
As I mentioned earlier, NAEP is now given only to a small sample of students, but one large enough to reveal statistically reliable generalizations about the various states. Teachers do not know far in advance that their schools will be selected for NAEP, and so have no incentive to corrupt the test by preparing students for test questions rather than teaching the underlying curriculum. And because each test-taker answers only some questions in the overall assessment, NAEP can cover a fuller sample of the curriculum than if all questions were crammed into each test-taker’s allotted time.
Sampling students and the curriculum means that NAEP can report no individual student scores. It is not a national test.
A very dangerous proposal is to make NAEP a national test by giving it to every student nationwide. This would corrupt NAEP in the same way that state tests have been corrupted under NCLB. Knowing in advance that their students would have to take the test, teachers could prepare students for it, independent of teaching the underlying curriculum. Giving all NAEP test takers identical questions would permit educators to predict which aspects of the broad curriculum would more likely be tested, creating incentives to stress these aspects and overlook others.
Most states have reported dramatic gains in state test scores under NCLB. But these gains have not been duplicated in state NAEP results. Partly this is because, unlike state tests, NAEP’s framework (the implicit curriculum implied by NAEP questions) is not so disproportionately skewed toward the easiest skills. Also, because teachers are not so familiar with NAEP that they can predict particular types of questions, answers are a more accurate reflection of what students truly know and can do. These characteristics will be lost if NAEP becomes an individual student-level national test. We would then no longer have an independent monitor of the performance of American students, or an accurate way to compare students in the various states.
National standards are a quagmire
Establishing unnecessary common standards leads to a quagmire we will soon regret. The late 1980s and early 1990s saw a similar belief that national standards would improve American education. In math, the National Council for Teachers of Mathematics (NCTM) promulgated standards that were fiercely defended by some and attacked by others. Some states adopted them while others did not. Since then, math performance of elementary school students has climbed substantially. Indeed, math scores of black elementary school students on the NAEP have increased so much that they are now as high as whites’ in 1982. In other words, if white students’ scores had remained stagnant, the black-white gap would have been eliminated. Because NAEP has only recently been given to large enough samples to generate accurate state-level (as opposed to national) results, we can’t say whether the improvement was greater in states that adhered to the NCTM standards. Although there may now be more agreement about math instruction than 20 years ago, any attempt to re-introduce national mathematics standards could set off another round of “math wars.”
A fierce fight also developed over proposed national American history standards. Disputes between those stressing facts about political and economic leaders, those stressing the experiences of workers, women, and minorities, or those wanting students to interpret original source documents, persist today. A new attempt to establish national history standards will set off a similar war.
We have a recent example of how national standards can be politicized. Under No Child Left Behind, “Reading First” funds were used to establish implicit national reading standards requiring an excessively mechanistic curriculum. Corrupt administration of these funds by the Bush administration may have helped to discredit this approach, but an effort now to make this national curriculum explicit will set off unproductive battles between its advocates and those favoring “whole language” or “balanced” teaching.
And do we really want Congress debating whether evolution is only a theory? Proponents of national standards warn that without them, some states will adopt such an approach. Skeptics about national standards (like me) worry that with them, all states may be required to do so.
This ends part one of a four-part interview. Part Two will address the idea that the US is falling behind other nations in the race to succeed, and will be posted on Thursday, May 14th.
Richard Rothstein is part of the new project A Broader, Bolder Approach to Education.
What do you think of these ideas? What is your opinion of the push for “tougher” national standards?