International benchmarking tests always make headlines, but reading between the lines would help education researchers and policymakers better use the tests to improve schools.
That’s the upshot of a new report on international education assessments by the National Academy of Education and an accompanying article in the journal Science. A working group of testing experts in the Academy analyzed methods, use, and media coverage of large-scale tests including the Program for International Student Assessment (PISA), the Trends in International Mathematics and Science Study (TIMSS), and the Progress in International Reading Literacy Study (PIRLS) during the past 50 years.
“In general, equating of tests across countries is a good idea,” said Judith Singer, Harvard University education professor and chair of the report committee. “The challenge is, countries aren’t sports teams ... If you are reducing a country to a ranking, it is concealing so much.”
The group found that over time, international tests have come to be used not just to describe and different education systems and track their changes over time, but also to compare countries as though they had de facto international standards, evaluate the effectiveness of policies or even curricula, and examine the relationships between schools’ contexts and student achievement.
“We should use international assessments for the purposes they serve best, rather than squeezing into them things they will never do very well,” said Andreas Schleicher, the director of the OECD’s directorate for education and skills, who was not part of the report group. “International assessments like PISA are extremely valuable tools for generating hypotheses. Who would have thought we’d be able to learn something from Finland or Shanghai? They open our eyes about what we think. ... but it will never get to the causal answers. I do not see international assessments as substitutes for causal instruments like [randomized controlled trials], but as complements.”
The report suggested several ways to make international test data more useful for practitioners and policymakers. Here are a few:
1. Stop pulling rank.
In reporting the results of global tests, the report noted countries are listed in rank order, but many countries’ average scores are statistically the same as others listed above and below them on the list.
This can lead policymakers to adopt individual policies from the leading countries without determining whether they caused that country’s achievement, Singer said. “A few years ago it was homework: Finland was on top, Finland doesn’t assign homework, so maybe we shouldn’t assign homework. Then a few years later, Finland isn’t on top anymore, so maybe eliminating homework isn’t the answer,” she said. “It’s a fool’s errand to try to identify the one secret sauce and try to roll it out. We do have a lot to learn, but we are not thinking broadly enough.”
Focus on rank can also give the wrong impression of a country’s performance. For example, Japan’s rank on the PISA in science rose from fourth in the world in 2012 to second in 2015, and local newspapers trumpeted its improvement. But the mean test scores on science literacy during that time actually declined, from 547 to 538.
The report called for tests to include more context on comparison groups, such as matching countries to others with similar resources and contexts and comparing education systems like Singapore’s to those of cities or states rather than countries. For example, a recent report based on PISA data compared how quickly Canada and the United States improved achievement for new immigrant students.
William Schmidt, director of the Education Policy Center at Michigan State University, was not part of the study committee, but he agreed that tests should provide more information about countries’ policy contexts. However, he cautioned against focusing too much on individual state or city scores.
“If you believe these studies are really about comparing where the United States is, you really need the United States” as the point of comparison, Schmidt said. “I’m not very comforted by Massachusetts doing well. If it was a horse race, maybe you’d want to put up our best against their best, but ... the bigger issues are our national policies around our system. We are a country.”
2. Dig into contexts.
All of the international assessments generate mountains of data that most policymakers and researchers never see, from background surveys given to students along with the tests. PISA alone collects more than 1,000 elements describing school policies, instructional practices, and student behaviors, all of which can affect student learning.
For example, Schleicher noted that the OECD has published the codes for all of its data elements and put both test results and background data online, but said it can still be complicated for researchers to use these data.
Schmidt warned, however, that the report left out a critical problem with international assessments: curriculum. The test administrators—the Organization for Economic Cooperation and Development for PISA, and the International Association for the Evaluation of Educational Achievement for TIMSS and PIRLS—determine the subject matter to be covered on each test, but it is difficult to tell how much of that content individual countries teach in the years before their students are tested.
“That’s the heart and core of it,” Schmidt said. “Comparisons really don’t make sense if you don’t know what the students cover in terms of say, mathematics. ... That’s a really critical issue to avoid misinterpreting the horse race ... It provides that context in which you can look at the top-achieving countries and ask what’s different.”
The report recommended that test groups augment their data on student and school backgrounds with outside data—for example, using data from the U.S. Census Bureau to supplement background data for the United States—to allow more consistent measures of indicators like a student’s socioeconomic level.
3. Use emerging technology.
Schleicher said PISA is already starting to respond to some of the criticisms of international tests leveled in the report. As of the 2018 administration, the PISA has moved to computer-adaptive testing, which he suggested would give a better indication of what each student knows and can do, regardless of what country they live in.
Schmidt agreed. “That’s a new form of testing that would probably make the whole testing process more sensitive,” he said, “because you don’t waste time asking the kids questions they will never have a chance of answering, and spending more time asking questions around the knowledge they do know.”
The report suggested that as more international tests move to computer-based formats, their supervising organizations will also have the opportunity to rethink how they design test questions and structure the tests to provide more useful information to educators.
The National Academy of Education will discuss the findings of the report in more detail at a symposium in Washington later today.
Photo Source: Getty
A version of this news article first appeared in the Inside School Research blog.