It seems President Clinton’s proposed voluntary national tests face hurdles beyond those posed by politics and Capitol Hill lawmakers.
Subcontractors working on the technical issues of how to make the tests valid and reliable are having trouble making sure the scores of the planned exams in 4th grade reading and 8th grade math can be connected to those on the National Assessment of Educational Progress. The experts, working under a contract from the Washington-based nonprofit group American Institutes for Research, presented their findings here last week at the Council of Chief State School Officers’ conference on large-scale assessment.
But that linkage has been one of the selling points of the proposed tests: Individual students taking the exams would be able to learn how they would have fared had they also taken NAEP, a survey of student achievement.
In a simulation, researchers with the Educational Testing Service in Princeton, N.J., found it would be virtually impossible to provide a single-score result for how students would have performed on NAEP. Indeed, students who scored at a given level on the simulated national test fell into wide ranges of scores on NAEP.
With such ambiguous results, even broadly classifying a student into the “basic” or “proficient” NAEP categories became a “coin toss,” said Paul L. Williams, who ran the simulation at the ETS. The researchers will keep working on it, though.
A provocative videotaped study shown at the testing conference had federal officials shaking their heads and vowing to make changes to U.S. research surveys of students and teachers.
A team from American Institutes for Research asked 4th and 8th graders to answer the same types of questions about their families and school practices that are routinely asked in such large studies as NAEP. Those background questions help provide context for the results obtained from the subject-area questions asked of students.
The problem is, the students, especially 4th graders, often fail to understand what the background questions are asking, according to the study presented by Roger Levine, the director of the cognitive-survey laboratory at the Palo Alto, Calif., office of the AIR.
As a video camera ran, researchers asked students to think aloud as they went through some typical background questions with Mr. Levine and his team. It turns out 4th graders are very literal-minded.
When the children were asked, “Does either your mother or your stepmother live at home with you?” several children from two-parent families answered no. One boy solemnly replied, “I don’t have a stepmother, and my mother works, so probably no.”
Many children were thrown by a question about how many family members lived with them; the question had a 62 percent error rate in the study because the children often overestimated.
The problems extended to questions about academics. Given a chance to answer one question “undecided,” it became clear the 4th graders did not know the meaning of that word, or, in another question, the noun “novel.”
But such difficulties are easy to fix, Mr. Levine found. Changing “undecided” to “not sure” and “novel” to “book with chapters” significantly improved the accuracy of student responses. Based on the findings, Mr. Levine said, changes are already being made on upcoming NAEP exams.
The people who design and rely on the results of large-scale assessment place too much emphasis on how those tests can help determine if educational spending and programs are achieving their intended consequences, a speaker told a luncheon audience here.
Instead, said W. James Popham, a Koloa, Hawaii-based emeritus education professor at the University of California, Los Angeles, the assessment community should seek a greater balance between that “accountability” function and using large-scale tests to improve instruction.
He argued that large-scale assessment is headed in a “fundamentally dangerous” direction. “The focus is on accountability,” he said. “The focus is not on helping kids learn better.”
“Many people in large-scale assessment don’t know squat about instruction,” he said, and they need to learn.
At a later session, Linda A. Bond, a national assessment consultant for the test publisher CTB/McGraw Hill, said that state testing officials and test designers do care about improving instruction, but they must meet the demands of lawmakers. “It’s very difficult to design a test that is very useful for accountability purposes and for instructional purposes,” said Ms. Bond.
One of the more popular small-group sessions at the June 14-17 conference was billed as Assessment and Accountability Under Seige. But after one presenter saw the approximately 100 people in the audience, she renamed the session “misery loves company.”
Using such analogies as white-water rafting, wackiest ship in the Army, and the orange highway-barrel syndrome, state testing administrators from Arkansas, Texas, and Ohio described many of the common obstacles they have encountered in building assessment systems that do the job and satisfy all stakeholders.
Among the problems they cited were the tension between maintaining the integrity of tests and the public’s perception that something is wrong with them because of their “secrecy,” and the contradiction between the public’s and policymakers’ demand for rigorous standards and the outrage expressed when large numbers of students fail and are denied diplomas or promotion to the next grade.
“How many masters can a state assessment system serve?” Patricia Porter, the assistant testing director in Texas, said in summing up.
Contrary to predictions, youngsters are sitting through an assessment designed to describe and analyze children’s transition into school and their progression through the 5th grade.
“The children seem to enjoy it,” said Jerry West, the project director for the Early Childhood Longitudinal Study, sponsored by the National Center for Education Statistics.
About 2,800 kindergartners and 1st graders in 51 schools took part in a field-test of the project in the 1996-97 school year. The formal K-5 study is slated to begin in the fall in about 1,000 schools in 41 states and the District of Columbia. It will involve about 23,000 kindergartners.
Many early-childhood educators worried initially that the little ones would be unable to sit still for it. But Mr. West said the children view the test, which takes 45 minutes to an hour and is administered one-on-one, as a game.
Results from the field-test also suggest that the children made great academic gains from kindergarten to 1st grade regardless of their race or ethnicity. But they ended up at different points because they started at different points
A second NCES study, which will follow 10,000 to 15,000 children from birth to 1st grade, will try to uncover why.
A version of this article appeared in the June 24, 1998 edition of Education Week as Linking Proposed National Tests To NAEP Bedevils Researchers