Why Education’s Most Famous Test Fails the Test
Whatever the outcome of the College Board’s investigation into scoring errors affecting thousands of students who took the October SAT, one thing is certain: There is no justification for the continued use of this psychometric icon.
To date, the College Board has acknowledged that Pearson Educational Measurement, the contractor responsible for scoring the SAT, fed damp answer sheets through its scanners. If that alone were the extent of the problem, the matter would hardly qualify as life-threatening for the gatekeeper to the nation’s colleges and universities. But the latest contretemps is symptomatic of a more fundamental problem.
Its roots go back to the start of World War I, when the country was faced with the urgent need to quickly identify officer candidates. The military found itself ill-equipped for the challenge, and turned to the American Psychological Association for help. Working out of the Vineland Training School in New Jersey, a special committee came up with the Army Alpha. This standardized aptitude test presented recruits with verbal and mathematical questions that allowed them to be ranked according to their intellectual abilities.
Recruits who scored high were sent to officer-training school, while low-scoring recruits were given other military duties. In all, some 1,750,000 men took the Army Alpha, which proved enormously successful for the task at hand. Encouraged by the efficiency with which large numbers of subjects could be sorted out, designers of standardized achievement tests decided to apply the Alpha approach to whatever subject content was being measured.
It’s at this point that standardized testing entered the morass that it still finds itself mired in today. The cause is the widespread misunderstanding of the difference between an aptitude test and an achievement test. The former is designed to predict how well a test-taker is likely to perform in a future setting. In contrast, the latter is designed to measure the knowledge and skills that a test-taker possesses in a given subject. While a test-taker’s scores on both types of test may be related, they do not necessarily correlate.
This confusion is reflected in the changes over the years in the SAT’s name itself. In 1926, when the test was first conceived by the psychologist Carl C. Brigham as an instrument for promoting greater meritocracy, it was called the Scholastic Aptitude Test in the belief that it assessed innate ability. But by 1994, the College Board had second thoughts. It renamed its premier brand the Scholastic Assessment Test out of concern that the original designation was associated too often with eugenics. In 1997, however, the board did some serious soul-searching and altered the name to simply the SAT, which stands for nothing. Changes in brand names with high consumer recognition are always a risky proposition. But the College Board had no choice. It had to put marketing ahead of pedagogy because it was feeling intense heat from critics hammering away at the contradictions in statements issued over the years about the SAT. These fell into two broad categories: its “coachability” and its predictive value.
Since 1946, when the SAT began to gain widespread use, the College Board had insisted that the test was not coachable. But Stanley H. Kaplan, who went on to establish the test-preparation company bearing his name, proved that this was not the case by helping students in his Brooklyn neighborhood dramatically boost their scores. His secret was constant practice. Until the late 1980s, when the College Board finally was forced to release copies of old tests to the public, Kaplan had to rely on reconstructing a typical SAT from items that students recalled in post-test meetings.
Stung by Kaplan’s embarrassing record of success in preparing students for the SAT in the face of its sponsor’s bold denials, the College Board was finally reduced to playing its trump card in the form of a claim that the SAT had predictive value. But Bates College, in 1984, was about to shatter that dubious assertion.
Despite existing practices among highly selective colleges and universities at the time, Bates College, in Lewiston, Maine, decided to engage in a pioneering experiment. By making the submission of SAT scores optional for students seeking admission, Bates wanted to determine whether the vaunted test was worthy of the pronouncements about it. In the fall of 2004, the college announced that its 20-year study had found virtually no differences in the four-year academic performance and on-time graduation rates of 7,000 submitters and nonsubmitters of SAT results. Mount Holyoke College in Massachusetts and other prestigious schools have reported similar results, swelling the ranks today to more than 700 like-minded schools.
For educators, the news confirmed what they always knew. As long as the SAT promises to provide the basis for comparisons among applicants for admission, its makers have to avoid loading up the test with items measuring the most important content emphasized by teachers. If the test largely included such items, which constitute the main justification for testing in the first place, the SAT would run the risk of having too many scores bunching together. In that case, it would not remain in business very long.
Teachers feel utterly demoralized by their inability to imbue their students with the knowledge and skills necessary for success on this dominant instrument, and students feel frustrated by the impact that their family backgrounds have on their performance.
That’s why items answered correctly by too many test-takers—typically, by more than 80 percent—are almost always deleted when the SAT is revised. What develops, therefore, is an educational Catch-22. The more successful that teachers are in teaching the most important content, and the better their students perform on items measuring this content, the more likely it is that the items will subsequently be removed.
To avoid that possibility, therefore, the makers of the SAT deliberately build in items that disproportionately assess what students bring to class in the form of their socioeconomic backgrounds, because this approach has consistently yielded the necessary differences among test-takers.
This strategy, however, has serious side effects for both teachers and students. Teachers feel utterly demoralized by their inability to imbue their students with the knowledge and skills necessary for success on this dominant testing instrument, and students feel frustrated by the impact that their family backgrounds have on their performance. The fact that SAT scores are tightly linked to the ZIP codes of students validates their disaffection.
Despite efforts to lessen the connection, the SAT has only managed to paint itself into a corner. No matter how carefully its makers try to develop items that minimize the role socioeconomic factors play in performance, the test ultimately is forced to rely on them in order to deliver the rankings that are the sine qua non of standardized assessment.
Recognizing the intrinsic unfairness of the SAT, Richard C. Atkinson, then the president of the University of California, in 2001 proposed eliminating the SAT requirement for the UC system’s applicants. He favored using standardized achievement tests believed to measure mastery of specific subjects students have taken in high school instead. In an attempt to placate its biggest client, the College Board undertook the most ambitious rewrite in its history. The SAT jettisoned word analogies, included tougher math items, and required an essay. While these changes provide a wider range of assessment, the new SAT must still create score spread to allow schools to rank applicants.
So after 80 years in existence, the test that has determined the destiny of so many college hopefuls ironically finds itself hopelessly deficient. The College Board will protest that the SAT’s power is exaggerated, but the evidence it presents won’t stand up to scrutiny. But then again, that’s something any high school senior who has recently sweated out the admissions ordeal knows all too well.
A version of this article appeared in the June 14, 2006 edition of Education Week as UnSATisfactory