Question & Answer: RAND Researcher Assesses Effects of 'High-Stakes Testing'
Although many educators have warned that the pressure to raise test scores may result in higher test scores but not improved student achievement, few researchers have actually studied the effects of so-called "high stakes" testing.
In one of the first studies of its kind, Daniel M. Koretz, a senior social scientist at the rand Corporation in Washington, and his colleagues at the National Center for Research on Evaluation, Standards, and Student Testing investigated whether a school district's high test scores present a misleading picture of achievement.
Over the past four years, the researchers examined test results from several districts and administered similar tests to determine if the students' scores reflect actual performance in a subject area. As educators may have feared, they found wide gaps between the two sets of scores.
At last month's annual meeting of the American Educational Research Association, Mr. Koretz presented preliminary results from his study, which represented findings for 3rd graders in a large urban district with high concentrations of minorities and students from low-income families. He discussed the study with Associate Editor Robert Rothman.
Q. How much of a discrepancy did you find between test scores and achievement?
A. In some cases, it was quite large. The discrepancy was larger, generally, in math than in reading, which is not terribly surprising. We had in two cases a difference of 15 to 16 percentile points on a national distribution scale. ... In terms of academic months, that's a difference of eight academic months. For kids in the spring of 3rd grade, that's a sizable difference.
Q. What do you think caused the inflated test scores?
A. There is only one possible explanation: Instruction was closely enough tailored to the tests used in the district. Kids learned better the material that was on the test than they did the larger area [of curriculum content].
This has to be a case of narrowing of instruction people have been warning us about.
Q. Are there ways of designing tests to provide a more accurate picture of student achievement?
A. That's a contentious issue now. My answer is, it's not so much the test, but how it's used. [Some people say] if you replace the test with something else, you won't have that [narrowing of the curriculum]. It may in fact be worse.
[The reason is,] in performance assessments, there is relatively little similarity between performance across tasks. Performance on one task won't predict how well you'd do on another. Performance assessment often involves very few tasks, while paper-and-pencil tests have 100-some questions, even more. [In performance assessment], you don't get the effect of averaging out discrepancies.
People would like the problem to go away if you change the test that is used. I don't see any reason to be confident it will.
Q. In a national test, which President Bush and others are proposing, would the problem of narrowing instruction be worse?
A. I would worry that a national test would produce the same things. There will be an incentive to worry about things on the test, and not worry about things not on the test. People will want to look good on the national test. ...
You could monitor the effect of tests, [however]. You could have another test, like the National Assessment [of Educational Progress], that people don't teach to. If scores on the test that matters to people go up sharply, and scores on the other test don't, you have to worry about inflated test scores.
Q. In presenting his proposal for "American Achievement Tests," Secretary of Education Lamar Alexander said they would be tests "worth teaching to." Is that an appropriate strategy?
A. A test can be worth teaching to and still give an inflated picture of what students know. ...
There are two separate issues. One is, are we giving tests worth teaching to? The other is, if people are teaching to them, good or bad, what sense can we make of the results? I'm afraid they have not been separated as much as they should be.
Vol. 10, Issue 33