Test Dilemma: Revisions Upset Trends in Data

By David J. Hoff — May 02, 2001 7 min read
  • Save to favorites
  • Print

By the time 2012 rolls around, Kentucky’s residents will almost certainly have a testing system that describes how well the state’s children are learning. What they won’t know is how far they’ve progressed since 1992, the advent of the state’s landmark school overhaul.

Because the state switched testing programs in 1998, it will lose the trend lines established in 1992 that many educators and policymakers expected to carry through the 20-year quest to improve student achievement. And when testing programs are reworked, or sometimes just tinkered with, they lose the ability to make direct comparisons with student achievement over time—a situation that every state with a testing and accountability program will likely face if it keeps its programs updated.

“People like me thought it would stay the same for 20 years, but that was naive,” said Robert F. Sexton, the executive director of the Prichard Committee for Academic Excellence, a Lexington-based backer of Kentucky’s school improvement efforts. “I don’t think anybody thought of the nuances and the ins and outs over a 20-year period.”

Comparing test scores from one test to another is like comparing race times on different marathon courses.

That’s just what Kentucky did. The new testing program started assessing students’ skills in some subjects at different grade levels. What’s more, the new program de-emphasizes so-called performance questions, such as writing essays and showing mathematical reasoning step by step, which are hard to compare from one test to another.

New standards are being written for the new tests. Since 1998, though, the state has used the performance standards from the old test for its accountability system as an interim measure.

State officials in Indiana, New York, and Ohio are learning a lesson similar to Kentucky’s as they begin to modify their own testing programs and the standards that outline how students should perform. While neither Indiana nor Ohio is overhauling its testing system as Kentucky did, both are adding new subjects and changing the grade levels at which they test. Consequently, they will be forced to re-evaluate their standards and find ways to connect student achievement from one testing system to another.

“Psychometrically, comparison can be done,” said Mary Tiede Wilhemus, a spokeswoman for the Indiana education department. “But there has to be a caveat. There’s no way around it.”

“Ours may not be as dramatic” as Kentucky’s, said Bob Bowers, Ohio’s associate superintendent for curriculum and assessment, referring to his state’s changes, “but we will have to adjust our trend line a little.”

New York, meanwhile, abandoned trend data on the regents exam several times in the 136-year-old testing system’s history.

By raising the standards and requiring all prospective graduates to pass the English and mathematics tests, the state essentially declared the previous test scores to be of historical note only, said Roseanne Y. DeFabio, the state’s assistant commissioner for curriculum, instruction, and assessment.

“In every subject, we look at the number of students reaching the standards rather than trying to suggest that the exams are equated from the old to the new,” she said.

‘The Best for Kids’

The only way to preserve the historical data, testing experts say, is to ignore the evolving improvements in the world of assessment. But they recommend that states continually review their testing practices so they can incorporate all the advances in methods and update performance standards to meet changing expectations. If they need to sacrifice the longitudinal data in the process, they shouldn’t hesitate, those experts say.

“If I have to pick between doing the best for kids or having a consistent trend line, I’m going to pick doing the best for kids,” said Andrew C. Porter, the director of the Center for Education Research at the University of Wisconsin- Madison and an adviser to Kentucky officials.

But advocates of the testing and accountability movement that’s under way nationwide are urging states to do everything they can to maintain data that compares achievement across time.

“The whole idea of how you continuously improve your standards and assessment and accountability while continuing to track kids over time is a huge issue,” said Matthew Gandal, the vice president of Achieve, a Cambridge, Mass.-based coalition of governors and business executives.

Mr. Gandal encourages all states to change their testing systems and performance standards so they reflect the latest knowledge of testing experts. But he expects that most will be able to preserve the achievement data.

“I would guess it would be a rare state” that has to lose its trend data, he said.

Differing Results

Kentucky inaugurated its Kentucky Instructional Results Information System, or KIRIS, in 1992 and declared it would be the basis for monitoring schools’ progress toward the 20-year goal that every student would reach the “proficient” level on the state’s performance standards.

In the six- year life of the program, members of the public criticized KIRIS for not producing scores that could be compared against national norms, meaning how students across the country perform on similar tests. Researchers also said the scores weren’t accurate enough to use in the state accountability system.

By 1998, the state legislature had replaced KIRIS with the Commonwealth Accountability Testing System, or CATS. The new program not only includes national norm-referenced sections, but has de-emphasized the performance assessments that made it difficult for KIRIS to produce an accurate gauge of student achievement. The program also changed the grade levels at which some subjects are tested.

After such changes, “a direct comparison [between KIRIS and CATS] is fraught with problems,” said John P. Poggio, a professor of psychology at the University of Kansas, in Lawrence, and the vice chairman of the board of technical advisers to Kentucky.

“Everybody wants to do a comparison to what it looked like in 1998, to what it looks like in 2000,” he said. “People have to recognize that this is a new program.”

To set new standards for the new tests, the state engaged 1,650 teachers for the past year. Their task was to decide what students should know and be able to do to meet each of the state’s four performance categories: “novice,” “apprentice,” “proficient,” and “distinguished.”

The state school board reviewed that work last month, and is expected to take up the subject again this week.

As they review the teacher panels’ reports, state officials are questioning why the proposed standards for CATS yield different results from the KIRIS ones. For example, 31 percent of elementary school students who took KIRIS in 1998 ranked as proficient in reading. Two years later, that proportion jumped to 52 percent under the standards proposed for CATS. By contrast, 26 percent of high schoolers scored as proficient in reading under KIRIS in 1998, but only 21 percent of the same age group would have achieved that level under CATS in 2000.

Helen W. Mountjoy, the chairwoman of the state board, said its members have to understand why the CATS results sometimes diverge from the ones on KIRIS before they adopt new standards.

But Mr. Poggio says it’s a matter of the former tests’ performance levels’ being incorrectly established, in contrast to the sophisticated and thorough process he says the state used to set the standards for CATS.

Has Improvement Occurred?

After the Kentucky state board decides how to reset the standards, it will need to figure out a way to explain the changes to the public, which is sure to be skeptical of wild fluctuations in scores.

Because so many teachers participated in setting the CATS standards—something that didn’t happen with KIRIS—many of them will be able to explain the standards’ content and meaning to their colleagues, Ms. Mountjoy said. “What it’s done is provide a level of credibility that we didn’t get [under KIRIS].”

But the board also needs to deal with the inevitable question: How can we tell whether student achievement has improved since 1998?

While education officials won’t be able to make direct comparisons on the two testing systems, Ms. Mountjoy said, they can look for clues through scores from other programs, such as the National Assessment of Educational Progress.

For the CATS program, however, there’s nothing else to say other than it’s a new starting point.

“You have to belabor the point that this is where we stand today,” Mr. Poggio said.

And even though schools’ scores may vary from KIRIS to CATS, none of them will be close to reaching the goal that all Kentucky students attain the proficient level by 2014, the new target date for that achievement.

“Even if some schools get an artificial jump [from CATS], they still have this incredibly challenging goal in front of them,” Mr. Sexton of the Prichard Committee said.

A Word of Advice

Testing programs tend to last only about six years before policymakers start fiddling with or overhauling them, according to Mr. Poggio.

In Kentucky, state leaders are hoping that the current round of standards-setting will last for a while.

“I’m not sure we’ll make it to 2014,” said Gene Wilhoit, the state commissioner of education. “But it’ll be nice to have some stability in the system.”

A version of this article appeared in the May 02, 2001 edition of Education Week as Test Dilemma: Revisions Upset Trends in Data


Student Well-Being K-12 Essentials Forum Boosting Student and Staff Mental Health: What Schools Can Do
Join this free virtual event based on recent reporting on student and staff mental health challenges and how schools have responded.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Curriculum Webinar
Practical Methods for Integrating Computer Science into Core Curriculum
Dive into insights on integrating computer science into core curricula with expert tips and practical strategies to empower students at every grade level.
Content provided by
Jobs Virtual Career Fair for Teachers and K-12 Staff
Find teaching jobs and other jobs in K-12 education at the EdWeek Top School Jobs virtual career fair.

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Assessment Cardona Says Standardized Tests Haven't Always Met the Mark, Offers New Flexibility
The U.S. Department of Education is seeking to reinvigorate a little-used pilot program to create new types of assessments.
7 min read
Education Secretary Miguel Cardona speaks during an interview with The Associated Press in his office at the Department of Education on Sept. 20, 2023 in Washington.
Education Secretary Miguel Cardona speaks during an interview with The Associated Press in his office at the Department of Education on Sept. 20, 2023 in Washington.
Mark Schiefelbein/AP
Assessment Opinion The 4 Common Myths About Grading Reform, Debunked
Grading reformers and their critics all have the same goal: grades that truly reflect student learning. Here’s how we move forward.
Sarah Ruth Morris & Matt Townsley
5 min read
Venn diagram over a macro shot of A- on white results sheet. Extremely shallow focus. Letter grades are highlighted.
E+/Getty + Vanessa Solis/Education Week
Assessment If ChatGPT Can Write Virtually Anything, What Should a National Writing Exam Test?
That's a question the board that oversees the National Assessment of Educational Progress is confronting amid AI's rapid ascendance.
6 min read
Image of a person using a computer, with glasses, papers, and pencil on the desk too.
Assessment From Our Research Center Few Educators Say A-F and Numeric Grades Offer 'Very Effective' Feedback for Students
Fewer than 1 in 6 educators—13 percent— say that A through F or numeric grades are a “very effective way” to give feedback to students.
3 min read
Cropped image of teacher standing in front of a blurred classroom of students with test results in hand showing the letter A in red.