In late September, thousands of students in Singapore sat down to take 90-minute tests in mathematics and science. Although they may not have known it at the time, the students were kicking off what many are calling the largest, most complex international study of students’ mathematics and science achievement ever undertaken.
The Third International Mathematics and Science Study, sponsored by the International Association for the Evaluation of Educational Achievement (which goes by the initials IEA), will involve more than 50 countries and 1 million students worldwide. The cost of simply administering the exam to the 20,000-plus students who will be taking it in this country is expected to top $3.5 million this year alone.
But beyond the sheer size of the effort, the study is important because it will attempt to shed light on one of the most central questions in education: What works? “I can probably sit here and predict which countries will do well and which will do poorly, and I don’t think we need another study to do that,’' says William Schmidt, a Michigan State University statistics professor who is directing the United States’ participation in the effort. “What we’re really trying to understand here is why.’'
Not since the successful launch of sputnik in the 1950s have international comparisons of student achievement generated so much attention in this country. The impetus for the current interest comes in part from the national education goals, which were written into federal law just this year. Among the goals is one that calls for American students to be “first in the world’’ in mathematics and science by 2000. International education comparisons such as the IEA study should help educators and policymakers figure out exactly what that means.
Such studies also provide a natural laboratory for education researchers. “Given that many people are reluctant to conduct controlled experiments with our children’s education,’' a National Academy of Sciences panel pointed out last year, “comparison of natural variation is usually the most feasible way to study the effects of differing policies and practices.’'
IEA, a private group headquartered in the Netherlands, has been conducting international comparisons since the 1960s. Thus far, the United States has turned in mixed performances. On the organization’s Second International Mathematics Study, conducted during the 1981-82 school year, U.S. students scored in the middling range on test items involving arithmetic and algebra and below the international average in geometry and measurement. On a more recent IEA reading test, however, American students ranked near the top.
Some critics argue that such international studies are simply “horse races’’ that rank students with little regard to the complex cultural, educational, and demographic differences among participating nations. They point out, for example, that students in some participating countries may not have been taught some of the material covered on the tests.
A 1992 analysis by University of Illinois researcher Ian Westbury of the IEA’s second mathematics study found that the tests used were tailored more closely to Japan’s mathematics curriculum than the United States’. Japanese students consistently outscored those from most other nations. In areas where the Japanese curriculum was less well-matched to the assessment, U.S. students’ scores were comparable to those of the Japanese students.
The new IEA study, which has been in the works since 1990, was designed to address these criticisms and others. Andrew Porter, a University of Wisconsin researcher who sits on a National Research Council board that oversees U.S. participation in international comparisons, points out that the assessment will include more subjects and nations and focus more attention on classroom contexts than any of its predecessors. It will be administered in each country to three specific groups of students: those in the two adjacent grades containing the most 9-year-olds, those in the two adjacent grades containing the most 13-year-olds, and those in their final year of precollegiate schooling. Between 15,000 and 20,000 students will be involved in each country. The basic test consists of 70 multiple-choice questions and 30 longer open-ended questions. In addition, smaller subgroups of students will be given an hour-long performance assessment, which may ask them to conduct a physics experiment or work out and explain in writing a complex mathematical problem. Students who are in some way specializing in mathematics and science in schools will take different forms of the tests than others. In the United States, that group includes high school students taking advanced classes in those subjects.
But the examination part of the study only sheds light on what the researchers call the “attained curriculum,’' or what students have learned. To gather clues on the “intended curriculum’’ (what they should have been taught) and the “implemented curriculum’’ (what they were actually taught) researchers have developed other measurement devices.
They have begun to analyze, for example, the most widely used math and science materials in all the participating nations. “In countries that have very centralized education systems, that task is easy,’' says Schmidt. “In other countries, like the United States, that could mean a lot of texts.’'
In all, the researchers collected more than 1,200 textbooks and other curricular materials. They analyzed their content, cut them into pieces, and coded each piece. Then all the information was fed into computers. The preliminary results of that analysis are expected to be completed by early next year. But according to Schmidt, the research is turning up “astronomical differences among countries in regard to what is considered mathematics and science.’'
For clues regarding the “implemented curricula,’' the researchers will be surveying students and teachers at the schools where the testing is taking place. Students will be asked about their home backgrounds as well as their classroom experiences. Teachers will be asked to provide sample lesson plans, among other things.
Three countries taking part in the study--Germany, Japan, and the United States--are paying for small groups of researchers to videotape typical lessons in participating 8th grade classrooms. “In isolation that would not be much,’' Schmidt says, “but in the context of the larger study, it does provide useful data.’'
The sheer number of countries participating in the study presented a number of hurdles. Just reaching a consensus on the framework for the exams took two years. Haggling over specific test items took almost as long. Indonesia, for example, complained about test items that refer to seasons, since there are none in that country. Norway, on the other hand, complained that the earth-science section of the exam did not address ice forms--common knowledge for students in that part of the world. “We decided that since we cannot be fair to everyone we will be equally unfair to everybody,’' says Albert Beaton, the Boston College professor who is coordinating the international effort.
The first round of testing, which began with Singapore, has been taking place throughout the Southern Hemisphere this fall. Students in the United States and other Northern Hemisphere nations will be tested in the late spring. The final results of the assessment will not be available until 1996. As ambitious as the study is, it is not expected to provide definitive answers to many of the questions educators and policymakers have about U.S. schools and students and their counterparts overseas. “To get to that precise level,’' Porter says, “requires finer-grained and more carefully controlled studies.’' But, he adds, “this will go a lot further in that direction than any past international assessment of student achievement.’'
A version of this article appeared in the November 01, 1994 edition of Teacher as The Test Heard Round The World