Misunderstanding Meaning, Users Overrely on Scores (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Robert J. Sternberg

Robert J. Sternberg is IBM Professor of Psychology and Education at Yale University and author of the “Intelligence Applied” thinking-skills program for high-school and college students.

I was stunned when Barbara (names of students in this essay are fictitious), an applicant to our graduate program, was rejected. Letters of recommendation from people we knew and respected testified that she was a stellar researcher, and when I read her work, I thought it excellent. Our program emphasizes research; Barbara produced superior research; since the best predictor of a given kind of behavior in the future is the exhibition of the same behavior in the past, Barbara should have been accepted. Because her aptitude-test scores were lower than those of students normally admitted to our program, however, she was rejected. The members of the admissions committee who voted against her acceptance didn’t want to take the risk. For them Barbara was, at best, an “overachiever,” someone whose achievement exceeded her ability and whose low ability would catch up with her in time.

The mystifying concept of overachievement begins to suggest the fallacy of overreliance on standardized intelligence tests. What does it mean to say that someone achieves at a level above that of his ability? The term signifies a problem not with the test-taker but with the test or other predictor. With Barbara, the test (in this case, the Graduate Record Examination) was missing something important: Her achievement in research was spectacular.

Barbara’s case raises two fundamental questions about tests of mental ability. First, why do decisionmakers pay so much attention to ability-test scores, even in the face of blatant counterevidence to the grim predictions that might be made from a given score? Second, how can someone as capable as Barbara turn in such a poor performance on a standardized test? As the answers to these questions will show, such tests are not simply overused as predictors of future performance but are in fact deficient in their measurement of intelligence.

Confusion of signs with significates partially explains the overreliance on test scores; interpreters of intelligence tests often fail to distinguish between an indicator of something and the thing itself. Everyone knows that a smile is a sign of happiness. But it is possible to be happy without smiling, just as it is to smile without being happy. Similarly, test scores are only a sign of intelligence. Some test-takers possess intelligence that tests do not measure, while others earn scores that overstate their intelligence. The originators of the early intelligence tests--scholars such as Alfred Binet and David Wechsler--recognized that intelligence is reflected in what we do in our everyday lives and that test scores are merely a sign of ability. Many of their disciples, however, seem to have forgotten what the masters knew.

Misunderstandings of the relationship between sign and significate sometimes result in ludicrous misuses of tests. For example, when I worked at the Psychological Corporation one summer, we received a complaint from a woman about to graduate from a teachers’ college in Mississippi. She had received a score on the Miller Analogies Test one point below the school’s cut-off for admission. She was admitted anyway because of other “extraordinary” credentials. Several years later, however, after she had completed honors work, the school was ready to withhold her diploma pending her re-taking the test and receiving the requisite score. The predictor had become a criterion for graduation.

Conversely, not all students who earn high scores perform up to expectation. Alice, for example, had tested well and gained admission to our graduate program. In fact, during her first year in the program, she performed as well as the test scores had predicted. By the time she had finished the program, however, she ranked perhaps in the 30th or 40th percentile of our graduates. Clearly, something had gone wrong with this “risk free” admission, and the something wrong wasn’t motivation. She was highly motivated to the end. Later, I will suggest the kinds of abilities necessary for success in the program that Alice lacked (abilities that Barbara, I believe, possessed), but for now, I want merely to point out that Alice was not an isolated case.

Beyond confusion about the signification of scores, a second cause of the misuse of tests is decisionmakers’ fear of culpability in the event that a candidate like Barbara doesn’t measure up. If a student with high test scores fails to achieve, the admissions officer can always blame the Educational Testing Service, or whatever publisher provided the admissions test. How could the admissions officer be expected to know that the person wouldn’t work out?

On the other hand, should a student like Barbara fail, the blame would seem to fall right into the admissions officer’s lap. After all, he or she was the one who made the decision to admit in the face of poor test scores. Few people want to risk taking this kind of blame, especially when there are other candidates whose test scores make them less threatening as prospects for admission.

Third, their own backgrounds can influence admissions officers. If they are in the position of making such decisions, they must themselves, some years earlier, have been accepted either into that program or a comparable one. And if they were accepted, they must have had reasonably high test scores. It is human nature to judge others by the criteria of our own achievement. Admissions officers, evaluating candidates in the light of their own scores, might find many who fail to measure up.

Fourth, schools and programs within schools often publish average test scores and other statistics, such as ranges of scores or other measures of score dispersion. In a relatively small program, just one or two Barbaras might wreak havoc with the test statistics. Should mean test scores drop substantially, one’s program may no longer look so competitive. Concerned about the program’s apparent decline in quality, strong candidates might start going to other schools.

Fifth, users of test scores too often exhibit what I call the “rain-dance mentality.” Suppose that a town is encountering a serious drought. I promise the citizens that, for a fee, I will bring them rain. I do a rain dance. When it doesn’t rain, the townspeople want their money back. I point out to them that the current drought is a heavy one and that it might take two, three, or more applications of the rain dance to get rain. I perform more dances, and eventually it rains. I cheerfully point out that I have, indeed, brought rain. The story suggests that superstitions remain in force because they are not refutable: No evidence would convince their adherents of their untruth. The same lesson applies to low test scores.

While many institutions do not have official cut-off scores on tests, the people who make decisions about admissions frequently have tacit cut-offs. They may secretly believe--perhaps without acknowledging the fact to themselves--that candidates with scores below a certain level can’t do the work. These applicants are not admitted. The admissions officers can then truthfully say that no one with scores below the tacit cut-off ever successfully completes the work. Of course, no candidate who fails to earn the minimum score ever has the chance to show that he can succeed in the program. The superstition cannot be disproved.

Misuse of tests stems also from users’ confusing the part with the whole. The skills measured by intelligence tests--primarily memory and analytical reasoning--do not constitute all of intelligence. Yet, though existing tests tell us only parts of the story, we often act as if we have the whole.

We can now address the second question posed at the beginning of the essay: What aren’t the tests testing? How can we explain the discrepancies between test results and actual accomplishments in the cases of Barbara and Alice? Viewed in terms of my own “triarchic” theory of human intelligence, standardized mental-ability tests measure only one of the three aspects of intelligence, and they measure that aspect (the analytical skills in which Alice excelled) imperfectly. The tests do not measure synthetic or insightful-thinking skills of the kind possessed in abundance by Barbara, nor do they measure practical intellectual skills of the kind possessed by a third student, Celia.

When we admitted Celia to our program, we did not anticipate a distinguished performance. Sound though her qualificationswere, none looked outstanding. In fact, however, Celia proved to be a great success when it came time to enter the job market. She succeeded because of her practical skills: She made sure that she was doing the kinds of things that the program rewards, that she had three good letters of recommendation nailed down, and that her resume would look just right for winning her the kind of job she wanted. Celia could go into an environment, figure out the reward system, and then adaptto it. Important as such adaptive skills are for everyday life, they are not measured by current standardized tests.

When we look for bright students, we generally look for Alices, rather than for Barbaras or Celias. The selection and placement system in education is heavily weighted in favor of applicants who test well. We often pass up students who may possess other strong qualifications, if not high scores, either because we don’t recognize their talent or because we perceive them as too risky. In fact, through our system of rewards and punishments, we create Alices. It is Alice who wins the love of her teachers, earns high scores on tests, and, ultimately, gains admission to the school of her choice. The Barbaras and Celias receive fewer rewards and may even be punished for counternormative behavior. The time has come to create Barbaras and Celias as well as Alices, and to recognize their talents in a broader program of testing.

My own Sternberg Multidimensional Abilities Test, to be published in 1989 by the Psychological Corporation, will be a step in this direction. Still, in evaluating results from this or any other test, users must remember the fundamental principle of prediction: The best predictor of a given characteristic or behavior in the future is the exhibition of that characteristic or behavior in the past. No test can indicate the probability of future success as reliably as a record of accomplishment.

A version of this article appeared in the September 23, 1987 edition of Education Week as Misunderstanding Meaning, Users Overrely on Scores