Reconsidering Standards and Assessment: 'Standards' Should Mean 'Qualities,' Not Quantities
With all the current talk of the need for "raising standards" in education and establishing "national standards," we must exercise a bit of caution--not only because no one thinks their standards are too low, but because too many people mean nothing more than that test scores should be raised.
To see test scores as the key indicator of educational well-being is a glib response to the problem of standards. Is it too cranky to say that, in the last 20 years, the most massive investment in testing ever undertaken has coincided with a palpable decrease in the quality of education? Can an increase in testing ever yield improved quality in schools? To suggest that it can is akin to saying that more accounting results in higher-quality products or services in business, or that more taking of one's temperature will lead to better health.
Let us re-inject some common sense into the debate about school quality. Let us reconsider just what we mean when we speak of "standards" before we embark on yet another quest delivering only more standardized data collection instead of better schooling.
Standards refer to qualities, not quantities. As the history of the word reminds us, a "standard" is a set of values around which we rally; we "defend" standards. (The "standard" was the flag held aloft in battle, used to identify and orient the troops of a particular king.) A psychometric tactic has caused us to lose sight of quality: Thinking of "standards" as the setting of a cutoff hides the fact that standards represent differences in kind, not degree--desirable behaviors, not the best typical behavior (or "superior mediocrity," as John Dewey once termed it).
What are the signs of a student or school with high standards? Most observers' answer to this question would have nothing to do with the degree to which traditional "content" has been learned. A typical response would suggest that students with high standards are diligent, thoughtful, engaged, persistent, and thorough--no matter what they learn.
Such a description does not mean that students with "high standards" are merely those who happen to be in the first quintile on tests; this utterly confuses cause and effect. Rather, their work and conduct regularly display qualitative differences from those of their peers. Students with high standards resist the tendency to be satisfied with slapdash work or merely "correct" answers.
And to equate high standards in a school with high test scores is fallacious reasoning. Many quick and even gifted students are disengaged from their work. They do not perform to high standards, even if their scores seem to say otherwise--just as many schools, blessed with bright children from well-to-do families, can hide their sins behind high test scores. Other students, because of poor prior schooling or slow learning styles, may have mastered less content than their peers; so what? Their daily work may still be regularly done to the highest standards.
I do not mean that there isn't a "core" worth learning. But while a curricular framework provides standards for ensuring that students are given high-quality assignments, it can provide no guarantee that the work students produce is of high quality. That depends on their receiving exemplary assignments and assessments. When multiple-choice tests drive instruction, many students go from one teacher to the next without gaining the self-discipline required for working at a high standard. Many teachers then cite the self-fulfilling prophecy that kids cannot work to higher standards, and the vicious cycle continues.
Standards are revealed in the everyday behaviors and policies of a person or school. To have "high standards" in intellectual affairs is to live out one's virtues with consistency- particularly in the face of daily hassles or bureaucratic constraints. I am interested not in whether students can cram for a high-pressure test but in what they are wont to do when the local and state authorities aren't watching.
The good school is a community on a quest for excellence; it seeks out and encourages its laggards. We would therefore expect to find "quality" in institutions by the coherence of the institution's overall performance- the habits and behaviors regularly revealed by all its members. A school with high standards, then, is recognizable not simply by the work of its best students and teachers but by the small gap between the work of it." best and the work of its worst. Where is an accountability program that honors this basic truth about institutional quality?
We regularly confuse the difference between "standardized" tests and tests checking whether students’ work is "up to standard"; we confuse "standards" with "standardized measures." A “standardized” test provide ' a uniform set of procedures and questions for purposes. of making "valid" comparisons. It is a different matter--a matter of intellectual values--to ask: "What are 'standards of intellectual conduct' and do students display them in whatever work they do? What ‘measures’ will do most justice to our 'standards'?”
Seeing if we are "up to standard" does not require measures that are rigidly "standardized." Psychometric considerations now so dominate test design that the demand for uniform "measures" has corrupted the "standards." By defining the standard in terms of a standardized measure, we have turned quality on its head.
This statement seems less polemical and more insightful when we recapture common sense. Recall how the best colleges, graduate and professional schools, and businesses determine whether student or employee work is up to standard. Students and employees are judged on their own idiosyncratic, contextualized performance. The process may be uniform, but the questions and possible answers are not standardized. In most other countries, in fact, the major student assessments require extensive, open-ended writing; the tasks are of high quality, and do not require and will not yield machine- readable answers.
Answers to multiple-choice test I questions cannot reveal students' qualities of mind and action. I learn little about students' standards from a test in which the right answers need only be chosen--whether or not they get the questions right. Standards are revealed in the manner by which work is approached and completed, over time. The only "quality~ visible in such test results is the rightness or wrongness of answers. No multiple-choice item can show whether a student's answers derived from thoughtful understanding and good habits, dumb luck, or native cleverness hiding bad habits; no “norming" process can substitute for examining the student's habits directly.
At the highest levels of policymaking, there is confusion on this point. Many educators still erroneously think that changing test item "-the "input"--enables us to better guarantee the quality of students' work--the "output." For example, according to the Dec. 13,1989, issue of Education Week, the governing board of the National Assessment of Educational Progress, "is considering a plan to set national goal for student performance . . . . Under the plan, the board would determine the skill and knowledge, as measured by NAEP test items, that ought to be mastered at each grade level."
The staff proposal "recommends that the board establish an advisory panel to examine the actual questions on the 1990 math assessment and determine which ones students need to answer correctly in order to reach the different performance levels."
This is confused thinking, driven only by a desire for expediency in assessment. The NAEP scales are a great idea, though still needing technical and empirical work; they are, in principle, necessary but not sufficient for judging quality. Compare this use of such scales with their use in diving or gymnastics: We do not assume that higher quality automatically results from adequately tackling more difficult work. The athlete's score is determined by multiplying the degree of difficulty by the score for the quality of the performance.
Until we have tests that center on the qualities of students' answers, we will lack the evidence necessary to judge whether a high score is good enough.
The technology of multiple-choice tests means that test "items" can never meet intellectual standards, even if they meet psychometric standards. "Items"-as the very word implies--can never be exemplary assessment tasks, even if they sample from an exemplary domain of "content." Students need only recognize the right answer from the multiple choices.
But can they produce high-quality work when the cue of the four choices is removed? Do they possess the skill required to call forth and integrate the bits into a whole, or the good judgment to know which element of their repertoire is required when? Do they have the discipline to persist in fashioning that whole?
The question for quality-assessment design must therefore be. What kinds of task are worth undertaking?
Building tests on the framework of a bell curve of results ensures that the system as a whole will never have high standards. Raising standards means involving students and adults in recalibrating their efforts against specified criteria of masterful performance, and judging success by the progress they all make in moving toward exemplary performance.
There is no reason that deliberate instruction should yield a standard spread of test results statewide, as if educational effects were random. Indeed, we will have succeeded in raising standards only when we alter the shape of the standard curve; it should become skewed to the right.
True progress depends on identifying shared, stable standards that illuminate daily work. At present, there are neither incentives nor structures to ensure that grades correspond to fixed performance criteria. There is no "inter-rater reliability" among teachers: What gets an 86 in one room can get a 72 down the hall, never mmd at a different school. It was precisely the unreliability of the transcript that led to standardized tests in the first place.
Then, to the confusion of practically everyone, states and districts periodically "re-norm" their standardized tests, and scores go down, How can student or teacher performance Improve under such conditions?
To assume that schools of "high standards" are those where teachers grade on a steep standard curve--as many "demanding" teachers do, and think they ought to--can only increase the gap between best and worst scores without improving quality. The most effective strategy for raising school standards is to devise a system that rewards schools for grading with criterion-referenced standards and achieving a high degree of inter-rater reliability, and that rewards students for progress according to these standards.
Standards cannot be raised unless they are demystified. Our years of being subjects of and then abettors to "secure" tests have dulled us to the foolishness of using secret standards and measures.
Secrecy is dysfunctional if the aim is bettering performance. Imagine trying to improve as a basketball player if one were "tested" by playing the heretofore secret game on the last day of a basketball "course."
The use of tests from outside vendors--and the "security" that protects the product's marketability, not just its validity--ensure that neither students nor teachers possess what they most need to raise standards: models of exemplary assessment and performance. They must know what mastery at exemplary tasks actually looks like--very different from our current practice of summarizing the curriculum and the general outcome' we Intend, and following up with "secure" tests. The ultimate test of whether standards are demystified IS whether students and teachers can accurately assess their own work on a regular basis; to do so, they must be able to compare their performance with exemplary work.
Higher standards do not mean higher dropout rates. The claim that they do, made by many critics of school-reform efforts, betrays a fundamental confusion about what a "standard~ is. A standard is an exemplar; whether few, many, or all students can meet or choose to meet it is an independent issue, calling for separate strategies and incentives. We must first have a rich vision of the possible.
At least give NAEP credit for trying to invent stable criterion-referenced scales, despite the technical nightmares of doing so with matrix sampling and cross-age, multiple-choice testing. In NAEP’s mathematics results, we discover, for instance, that only 6 percent of American students can work at "level 350," the highest level of problem-solving found in the test. The point is not to lower this standard because it is too hard to meet, but to set targets for the number of students who will meet the appropriate standard in years hence. (My enthusiasm for these scales should not, however, be construed as endorsement of the tests themselves.)
Any successful school reform must reinforce the belief of students, teachers, and parents that the tasks and scoring criteria are in reach and worth mastering--and not simply because the state says so. As the aim of assessment, students must internalize not "our" standard' but "the" standard--just as it works in judging diving and debate.
Immature or weak students respond to such a system: Watch them play sports, musical instruments, and computer games. They may "fail" regularly at replicating the performances of their models and heroes, but the absence of invidious personal comparisons and the opportunity to gauge progress according to clear standards ensure that the proper incentive exist for their continued striving toward competence.
Vol. 09, Issue 18, Page 36