To the Editor:
The article “Principals Criticized on Teacher-Retention Decisions” (Aug. 8, 2012) contains the following statement about results issued in a report by TNTP, formerly The New Teacher Project: “Of the teachers studied, the group identified a subset of about 20 percent as ‘irreplaceables’ because their students made two or three more months’ worth of academic progress compared [with] those taught with the average teacher in the district.” The statement is fundamentally flawed and represents a common misunderstanding fostered by the testing industry, which prefers to report results in months and years of achievement when all they have is the number of items scored as correct on a particular test: the raw score.
We do not know what an achievement test measures. A person’s score on a test of general mental ability and the person’s score on an achievement test, which is supposed to reflect what a student has learned in school, are almost interchangeable. A person scoring high on one test will also score high on the other test, even though the content of the two tests is quite different. A Texas researcher recently reported that the pattern of results on 100,000 achievement tests was best explained by a “latent trait” he called “test-taking ability"; test results had little to do with instruction. Depending on the scale and the test, a “month” of instruction may mean little more than one or two additional correct items.
In the literature, I have found little or no empirical support for taking the raw score for a given grade and transforming raw scores into months and years. There is no evidence to show that a change in raw scores correlates with months of instruction, much less effectiveness of instruction.
Items on achievement tests are supposed to represent aspects of the state’s educational standards. If we look at raw scores, we cannot tell what it means educationally that a student got two or three more items correct. Which of the standards do those additional items represent? What can the student taught by an “irreplaceable” teacher do that one taught by an average teacher cannot do? Moreover, each specific state standard is represented on the typical test by too few items to be a statistically meaningful sample of mastery of that domain of knowledge. If the pattern of correct items is highly similar in students taught by an “irreplaceable” teacher, how do we distinguish that result from coaching?
The uncertainties of achievement testing are so great we must question their uses, especially when the language in which results are reported is highly misleading.
Murray Levine
Distinguished Service Professor, Emeritus
Department of Psychology
State University of New York at Buffalo
Buffalo, N.Y.