You make some good points about the distinction between norm-referenced tests and criterion-referenced tests, but I disagree with your characterization of the latter.
The problem with norm-referenced tests, I think, is that you really never show much progress. If it is a test of fourth grade, half the children will be above the norm, and half are supposed to be below. It may be useful to know what the norm is, but it is misleading. I recall that for many years, the New York City Board of Education reported norm-referenced scores, and the newspaper headlines would scream that half the students in a given grade were “below grade level.” Since the norm was established to be sure that half were “below grade level,” such a result was predictable. And the public and news media never understood that the test was designed to get that result.
The promise of criterion-referenced tests is that the test-makers presumably determine in advance that students “ought” to know certain things and be able to do certain operations in a given grade. I hope I am not doing a terrible injustice to the field of psychometrics by my explanation, but I suggest that a good criterion-referenced test would be akin to a test to get a driver’s license, whether it is a written test or a performance test. The applicant must get a certain score on the test or they don’t get a driver’s license; the scores are criterion-referenced, not norm-referenced. It is possible that everyone might get a driver’s license, if all the applicants know and can do whatever is expected by the people who made up the test of state laws and driving operations. And it is equally possible that everyone might fail the test. If we want safe roads and qualified drivers getting licenses, then we should want a criterion-referenced test, not one that is norm-referenced.
Think of the same question in terms of a test of what people weigh. If everyone is grossly overweight, then the “norm” is to be overweight. But if health experts set a certain range of desirable weight for, say, a woman of 40 who is 5’6”, then that is the optimum weight, regardless of what the norm is.
You describe these determinations about what students should know and be able to do as “politically” determined, because they are based on expert judgment, including the judgment of teachers of students in a particular grade. The NAEP standards are based on expert judgments, and when last I participated as a member of the National Assessment Governing Board, the process of setting standards was managed by the American Institutes of Research in a very professional manner.
Knowing that the standard-setting was done by professionals and involved the judgment of nonpartisan people, I get uncomfortable to see this process described as “political.” Calling it “political” suggests that some politicians rigged it to make it too hard or too easy. I don’t accept that because I have seen the process and seen that it is insulated from political influence.
Now, having defending the process, I’ll pass along a bit of hearsay that will stoke your fire as a critic of standardized testing. I recently attended a social event at which I met a long-time employee of the New York State Education Department. I had known this person off and on over the years but not awfully well. When we got into a discussion of the state test scores, she lowered her voice and said, in words to this effect, “When the scores come in, they are ‘adjusted.’ If they are too low, they are raised. They are anything that state officials want them to be. Then they are released. It’s a no-brainer to get high scores.” When we turned to the subject of graduation rates, she confided that it was easy to make them high for small schools: “Just make sure that the high-scoring kids, even in the poorest neighborhoods, go to the small schools, and the remainder are assigned to the large schools. Another no-brainer.”
This conversation reinforced my view that we need national standards and national testing, and that the tests should be conducted by officials with no reputational stake in the results. Until we have national testing, we will continue to have this bizarre situation where the states are reporting remarkable progress while NAEP scores remain flat.
I don’t think that scores on a national test should be a single measure of student progress. I think such scores are important as indicators, but should be used in combination with (as you suggest) grades on written work and examinations conducted by teachers.
As long as we continue to depend on state and local officials to grade themselves, we will live in a constant condition of grade inflation.
As to curriculum, I don’t think we can have a common civic culture, a common democratic culture without some shared knowledge, shared discussions, shared poems, and shared history. I know it is hard, but it is not impossible to agree on what should be shared and to recognize that the shared part of the curriculum need not consume more than about 40 percent of each subject, in history, literature, math, the arts, and science. Certainly in math and science, we do not expect every teacher to make it up as he or she goes along. Every discipline has a recognized body of knowledge (I know that term makes some people cringe, but not me!), and that body of knowledge changes over time, sometimes slowly, sometimes rapidly. It would be a shame if a student were to spend a year in a science class with a teacher who made it up as he went along, with no reference to what anyone else in the field has learned.
There is such a thing as “standing on the shoulders of giants,” and this is what a good education enables one to do, or so it seems to me. Because if you stand on the shoulders of giants, as the saying goes, you can see a lot further.
The opinions expressed in Bridging Differences are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.