The 'Goldilocks' Phenomenon
States have had to start somewhere, but how do they know their standards are 'just right?'
A recent article in these pages illuminates what I at times describe as the "Goldilocks" phenomenon in standards-setting nationally. Showing the impact of geography on acceptable standardized-test scores, the article highlights the mismatch between what states define as proficient student performance and what the National Assessment of Educational Progress deems proficient. ("A 'Proficient' Score Depends on Geography," Feb. 20, 2002.) The gap ranges from 52 percent more 8th grade Oklahoma students being judged proficient on state math standards on state tests than on NAEP, to 12 percent more Idaho students rating proficient on NAEP than on their state's test. The article also recounts the results of a study commissioned by Denver-area superintendents that found the Colorado standards- based exam high school students take to be more difficult than the SAT or the ACT.
The "Goldilocks" question is: Are standards too high, too low, or just right? Should Oklahoma raise its standards? Should Idaho and Colorado lower theirs? Is NAEP wrong to begin with? More fundamentally, what does it mean to "raise" or "lower" these standards? No one really knows for certain. Neither does anyone know exactly how to find out.
Psychometricians have a bag of tricks to tell how the scores students receive on one test are likely to relate to their performance on another test. These techniques, called test equating, have not yet been able to match up state standards-based tests with NAEP. States have employed expert panels that meet and use the "I know it when I see it" method, more properly known as bookmarking, to review standards and select those they believe all students at a particular level should know. The various performance levels on state assessments are designated by equally approximate means.
States have had to start somewhere. But how do they know if their standards are "right"? One solution is to wait several years to conduct "predictive validity" studies, which follow students who take the test through subsequent grades and college to ascertain how their test scores correlate with subsequent grades or other measures. This is the primary means by which the SAT is justified. The problem with predictive validity is that it tells little about whether anyone is teaching the right things at the right level of difficulty, on either side of the prediction equation.
These strategies avoid one issue, namely, what do students need to know and be able to do to succeed in their next educational environment? Once this is known, standards can be designed to get them there. The appropriate difficulty of assessment items can then be determined with some basis and precision. For example, the makers of the ACT designed their workplace-skills test, Work Keys, in just this fashion. The test, however, does not assess very well the academic content that constitutes the bulk of state high school standards.
Though the purpose of schooling should not necessarily be to prepare all students for college, more than 60 percent of graduating students do nevertheless enroll in some form of postsecondary education directly out of high school. It seems reasonable that the skills needed for college success would be at least one reference point for determining state assessment difficulty.
The problem that all states have faced is that colleges have not expressed their expectations in language comparable to state standards. The lingua franca of college admissions continues to be course requirements, course titles, and grades. States can't align standards and assessments with "four years of English, three years of math ... and a B average" even if they wanted to. Instead, they have had to set their standards based on what expert panels endorsed, on notions of what constituted "world class" expectations, and, in many cases, what was politically acceptable.
If postsecondary expectations could be determined, state assessments could at least be calibrated with one very tangible and important reference point. This would set the stage for predictive-validity studies. It would also open the door for frank conversations between K-12 education and higher education regarding their real expectations for students. Standards that were more firmly grounded would help teachers determine with greater confidence what was really important to teach. Students would be more likely to succeed in college and in the job market as well, given the increasing overlap between the skills required for success in college and those needed for the workplace.
One thing is about to change, however. Higher education is on the verge of defining its standards in a format comparable to that of state standards. Sixteen universities, all members of the Association of American Universities, have banded together with support from the Pew Charitable Trusts to create Standards for Success. The project's goal is to identify the key knowledge and skills entering students need for university success. To do this, the project has focused on the skills used and developed in entry-level college courses. The emphasis is on what actually occurs in freshman classes, not on what high schools "ought" to be doing.
In a series of meetings on university campuses across the nation, over 400 faculty and staff members who teach or advise freshmen described what is expected of students in university courses in major academic disciplines, including English, math, science, social sciences, foreign languages, and the arts. Many participants also contributed hundreds of samples of student work from their classes, work that illustrated how good is "good enough" in those courses. Such examples help to connect knowledge and skill statements with tangible levels of performance.
Faculty members also contributed course outlines from dozens of freshman entry-level courses and reviewed state standards documents to identify the specific standards that aligned most closely with their own expectations for students. This coming fall, the project plans to send to all high schools in the country a booklet and CD-ROM containing the statements of knowledge and skills, accompanied by faculty descriptions of what successful students do and corresponding examples of student work.
The project is also analyzing state assessments to determine the depth of knowledge required to answer items or complete performance tasks on the assessments. Each item or task is then reviewed to determine if it matches with any of the key knowledge and skills for university success. Released versions of the New York state regents' and the Massachusetts Comprehensive Assessment System's tests of English and math have been analyzed in a piloting activity. Tests from an additional two dozen states will be analyzed by the end of this summer. When the project is completed, states will know where their assessments stand relative to university expectations.
This project is the first of what may be a number of activities designed to provide reference points for state standards and assessments. For example, a consortium of organizations recently initiated the American Diploma Project to try to establish a common set of expectations for the high school diploma. If standards and assessments can be connected more closely and directly with the eventual outcome they seek to achieve, namely students' success beyond high school, the entire system will become more rational, grounded, and consistent.
Until then, we will witness the types of mismatches brought to light in Education Week, mismatches which will lead some to argue that standards should be raised, and others to contend that they should be lowered. And we will have no way of knowing who is right.
David T. Conley is an associate professor of educational policy at the University of Oregon and the director of the Center for Educational Policy Research. He also directs Standards for Success, a project administered by the Association of American Colleges with support from the Pew Charitable Trusts.
Vol. 21, Issue 29, Pages 40, 43Published in Print: April 3, 2002, as The 'Goldilocks' Phenomenon