Charting and Adjusting Test Scores: A Proposal (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Ramsay W. Selden

Contributor

Ramsay W. Selden is director of the state education assessment center at the Council of Chief State School Officers. The views expressed are his own.

Fair reporting of standardized-test results for comparative purposes poses a chronic problem: how--and whether at all--to account for demographic factors in displaying and using scores.

While the gaps are narrowing, low-income, minority, and limited-English-proficient students still score significantly lower than other students on achievement and aptitude assessments. And school districts with poorer economic bases are less able to marshal the resources necessary to offer the most effective academic programs.

Students’ socioeconomic background should be taken into account when we compare test scores for schools, districts, states, or nations. With new approaches to reporting, we can factor in these variables without foreclosing or foreshortening expectations for poor schools and disadvantaged students.

It is important to remember that the correlation between socioeconomic variables and standardized-test performance is only general, not absolute. Some low-income African-American and Hispanic students score at the very top of such assessments, and some middle-class whites at the very bottom. This fact suggests that over time, group differences can be eliminated.

Still, the statistical association between family background and test performance is strong. A child’s mother’s level of education generally has been found to be the most powerful single predictor of achievement among such factors, which can also include family income, mother’s or father’s occupation, father’s education, or race (not in itself necessarily equivalent to socioeconomic status).

The connection of these variables with achievement can be explained partly as a developmental phenomenon and partly as a testing phenomenon. According to the developmental view, households with more resources and higher levels of parent education provide children with subtle but direct supplementary “instruction": books in the home; trips to museums; parents’ reading to children and drilling them in language usage, math concepts, or scientific information; pressure and support to excel in school and on tests. The testing explanation holds that it is virtually impossible to escape group differences in knowledge as determinants of group differences in test performance. After the most obviously biased test items are eliminated, more advantaged students still have more opportunities to pick up--both in school and outside it--the information and skills typically tapped in the tests.

The problem, as assessments have become pervasive and powerful parts of the educational landscape, is how to take this association into account in the comparative reporting of scores. Schools, districts, and states serving relatively large proportions of disadvantaged students argue that reporting their results in absolute terms is not fair. Such students bring extra needs and require more services to reach average or higher levels of achievement. Therefore, these schools and systems contend, they should be compared, at least over the short run, with others serving similar populations.

Even some relatively advantaged schools ask for comparison with others similar to them, so that they can judge their performance more accurately. Other affluent systems fear adjusted scores or comparison bands because these methods of displaying results may diminish their standing.

But reporting results in relation to background factors creates a dilemma: While doing so may more accurately represent schools’ performance, it can institutionalize and perpetuate lower expectations for minority or disadvantaged students.

State and local assessment programs use a variety of practices to account for student background in reporting scores. The major approaches--each with its own advantages, disadvantages, and effects--include the following:

Adjusting results. Scores can be statistically adjusted to eliminate the effects of background variables, once those effects are known. A coefficient is used to calculate where the scores would be if the effects of a variable did not exist. The advantage of this method is simplicity: It provides one number that includes both a portion of absolute performance and a portion of adjustment based on socioeconomic factors. The disadvantage is that it is impossible to tell how much of the single score that we see results from raw performance and how much from the adjustment. If a school appears to do well, is that because its performance was high or because its adjustment was high?

Reporting by comparison bands. In this approach--pioneered by California--schools’ raw scores are displayed, but they are shown in the context of schools ranking immediately above and below on a composite measure of socioeconomic status. In this way, the reader can see how a school is doing not only in relation to comparable schools but also, at least ostensibly, in its overall performance on an unadjusted scale. Using a similar method, Massachusetts groups districts into community types--such as urbanized centers, residential suburbs, and small, rural communities--to display test results.

Groups in California concerned with the education of minority students have contended that absolute performance is obscured by the comparison-band system, with the effect of foreshortening expectations for poor students and their schools. Last year, a task force recommended that the practice be discontinued, but so far it has been kept in place.

Evaluating actual versus predicted performance. South Carolina calculates predicted scores on the basis of students’ prior performance and demographic characteristics. Schools’ performances are then evaluated against the expected levels.

This system--similar to the adjusted-results method--theoretically allows attention to actual levels of performance. But many would argue that it suffers the same liability as the California or Massachusetts approach--clouding absolute differences in performance that should be addressed and eventually eliminated.

Disaggregating results and other methods. Scores can be reported by student subgroup or by school, district, or state “type.” The Montgomery County, Md., district, for example, has tried in its reporting to equalize distributions of minority and low-socioeconomic-status students across quartiles of performance. Some educators recommend reporting “costatistics” with test results, such as the percentage of minority students in each school or per-capita income for states.

Should we account for background at all in monitoring education? If we expect all students to do well and if we expect group differences to be overcome, the argument goes, then we should not institutionalize different expectations or rationalize the poor performance of some schools.

But is it fair to compare Mississippi with Minnesota on student achievement, or suburban Jackson with rural Selma, when differences are so obviously based in part on social and economic conditions? Wouldn’t it be more useful to show how low- and high-performing schools stand in relation to schools doing better or worse than they, but serving students with similar needs or advantages?

Nowhere that I know of do we report to a poor black or Hispanic student, “You are doing well, for a disadvantaged minority.” Instead, we give each student and parent the straight scores in absolute terms. This practice seems to say that we have the fundamental perception and profound hope that each child can and will do well, regardless of circumstances.

But when we use results of assessment to monitor and direct educational programs rather than individual students, fairness, validity, and utility demand that background factors be somehow incorporated, at least until group differences are eliminated. And the need to take such influences into account will become even more urgent as state-by-state and international reporting of educational conditions increases over the next few years.

Following is a chart that attempts to move the reporting of test results in relation to socioeconomic variables one step further in validity and responsibility. This proposal preserves the pre-eminence of absolute performance, but it also reflects level of achievement relative to background conditions. In addition, it illustrates schools’ progress in eliminating differences based on socioeconomic status.

The proposal uses as its example a simulated report that could be made of states’ results on the comparative achievement testing to be done through the National Assessment of Educational Progress in 8th-grade mathematics in 1990.

The display shows each state’s average level of performance on an overall mathematics-achievement scale; more detailed information on performance in sub-areas of math would be provided elsewhere. Instead of placing states in alphabetical or pure rank order, the chart arrays them from low to high by their average level of student socioeconomic status, as measured by appropriate variables.

The curved line indicates the performance of states as predicted solely on the basis of their socioeconomic conditions. This line could be shaded to moderate its visual importance.

The display has several advantages. Through the relative length of the bars, it represents overall rank-order performance among the states on an absolute scale. It also indicates whether a given state’s performance is higher or lower than that of states near it in socioeconomic level. Through the length of the state’s bar on the graph in relation to the curved line, the chart shows the level of a state’s performance in terms of its own socioeconomic conditions. Finally, over time, it will reveal whether the gaps created by the connection between socioeconomic background and performance are being closed. We might, for example, project the curved line for 1992: As these differences are overcome, the line will flatten, and our national progress in diminishing their importance can be tracked graphically.

To signify statistically meaningful differences in scores, confidence intervals--the points at which differences become big enough to be meaningful--could also be marked in the graphics without making the display too complex.

This approach--adaptable for schools, districts, or nations as well as states--is one solution to a dilemma in reporting test results: It helps reconcile the immense social goods of making fair comparisons of school systems and holding equal expectations for all students’ learning.