An 'Adaptive Testing' Precursor Can Offer Lessons for Today (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Your In Perspective article on “adaptive testing” (“Adjusting to Test Takers,” Nov. 19, 2008) was of special interest to me, since work done in the 1970s in the Portland, Ore., public schools’ evaluation department, which I headed, was the precursor of this type of achievement testing.

The Northwest Evaluation Association, referred to in the article, was first organized by me as a coalition of school districts in Oregon and Washington state to administer hundreds of tests for which the department needed to find the relative difficulty of items on a curriculum continuum extending from low grade 3 to high grade 8.

George Ingebo, our testing director, saw in the work of Georg Rasch, a Danish mathematician, the potential for constructing such tests, and with Fred Forster, a computer and statistics expert recruited from the Chicago public schools’ research department, designed hundreds of field tests with interlocking items to be administered by the cooperating districts. Using the performance results from this extensive field-testing, each item was assigned a difficulty level on an equal-interval scale.

To analyze the field-test results, Fred used Portland’s giant Bonneville Dam computers, since district computers at that time were much slower.

This created a large bank of items that were calibrated in difficulty, with each item value aligned on an equal-interval scale and referenced to a well-defined learning outcome. The task of writing items that measured attainment of outcomes was supervised by Linda Peters, who also wrote the outcomes to be tested.

The tests created from this item bank were administered fall and spring each year in grades 3 through 8 in all Portland elementary schools. They were instrumental in producing significant annual increases in achievement in each grade for a period of 14 years.

I retired in 1982. A few years after that, the bank of calibrated items used to create Portland’s tests was sold by the district to the Northwest Evaluation Association, and was used, under the direction of Allan Olson, to develop tests similar to those created in Portland. The test company he created is the current Northwest Evaluation Association, whose Measures of Academic Progress, or MAP, assessment your article reported is now being used in 2,340 districts across the United States.

Adaptive testing by computer requires calibrating the difficulty of test items and referencing them to specific learning outcomes, just as the Portland Achievement Levels Tests did. These calibrations are often the result of refinements of Rasch’s methods measurement specialists refer to as “item-response theory.” But the objectives of the measurement of adaptive testing are the same as those originally established in Portland’s program and now used in NWEA’s MAP program: to discover where students are on a continuum of growth to help plan the next steps in learning. Typical standardized “grade level” tests, such as those used in state and federal No Child Left Behind Act accountability programs, do not yield the accurate measures of individual growth so important to teacher planning and student success in learning.

Your article mentioned the difficulty of applying principles of adaptive testing in fields of learning such as science and social studies. The reason for this seems obvious. As children advance beyond basic elements of reading, math, and language usage into upper grades and high school, the diversity of what is most helpful for them to learn increases. Measuring the relative difficulty of what children learn becomes not nearly so important as helping them understand the relevance of what they study to their personal needs and societal obligations. This understanding cannot be measured in quantitative terms.

High school math and science courses considered essential for college-entrance exams and which follow generally accepted orders of learning sequence in subject-matter “disciplines” should benefit from the use of calibrated, item-adaptive testing. But courses and tests specially designed for students who need instruction in skills and knowledge of social and personal value normally would not find such tests appropriate.

Victor W. Doherty

Mount View, Calif.

A version of this article appeared in the January 07, 2009 edition of Education Week as An ‘Adaptive Testing’ Precursor Can Offer Lessons for Today