International Exams Yield Less-Than-Clear Lessons
Differing Demographics, Politics, Cultural Norms, Complicate Understanding
Almost every time the results of an international test of student achievement are released to the world, the reaction among the American public and policymakers is like that of a parent whose child just brought home a disappointing report card.
Elected officials and academic experts question where U.S. students fell short: Was it our curriculum, our teaching, or a confluence of out-of-school factors? What did other nations do well? And what changes to American classrooms would help U.S. students make strides on the next round of tests?
Despite such reaction, many observers—even those who interpret the test scores very differently—say that American policymakers need to guard against simplistic interpretations of the results of PISA, TIMSS, or PIRLS, the acronyms for three high-profile tests given periodically to samples of students in dozens of countries. Similarly, researchers and test experts urge U.S. officials to be cautious in the lessons they draw from the impressive scores of high-performing Asian and European nations.
Back in 1983, the seminal report A Nation at Risk warned that U.S. schools were slipping, a trend that posed economic risks for the country, the authors said. It’s a theme that has re-emerged in force today.
The National Commission on Excellence in Education, which issued the report a quarter-century ago, saw Germany and Japan as the United States’ chief rivals, but policymakers had only limited means to judge the panel’s gloomy hypothesis about American students’ subpar skills.
Today, international measures provide leaders with test scores and other data. Yet determining what those results say is a vexing task, complicated by variables in demographics, policies, and social and cultural norms.
The tests offer little concrete information about why some countries score so well, making it difficult to mine lessons for school policy and practice, according to Daniel M. Koretz, a Harvard University researcher.
“We shouldn’t go assuming that just because [high-performing] Finland did something, we can just adapt what they do to our schools and we know how it’s going to turn out,” Mr. Koretz said. The comparisons “are very good for helping us set expectations,” he said, “but they don’t tell us what’s working or give us a new or better tool.”
Applying lessons from high-scoring nations to the United States requires a careful analysis of how those countries educate their students, both in and out of school, figuring out what strategies are useful to American schools, and testing proposed changes before scaling them up, said Andreas Schleicher of the Organization for Economic Cooperation and Development.
“The temptation is to copy and paste education systems of high-performing countries into your own,” said Mr. Schleicher, the head of education indicators for the Paris-based OECD, which oversees the Program in International Student Assessement, or PISA.
“But the best way to use the results,” he said, “is to look at the drivers that make a particular education system successful or less successful, and think about how to configure those drivers in your own national context.”
Even as international assessments have garnered increased attention among U.S. researchers and policymakers, there are broad disagreements about what those results say about American students’ performance.
One reason for those disputes is that U.S. test results look very different, depending on the exam. American students, for instance, fare reasonably well on the Trends in International Mathematics and Science Study, or TIMSS, scoring above the 2007 international averages in 4th and 8th grade math and science by statistically relevant margins. American 4th graders also scored high on the Progress in International Reading Study, or PIRLS, notching marks well above the international average.
But on PISA, the United States scored statistically below the 2006 international averages for industrialized countries in both science and math literacy.
TIMSS, like the domestic National Assessment of Educational Progress, primarily tests students’ knowledge of school-based curriculum, though it does so across countries.
The goal of PISA is different. It measures the skills students have acquired and their ability to apply them to real-world contexts. Unlike TIMSS, PISA evaluates not only in-school learning, but also abilities students have picked up outside of school. PISA also tests students of a specific age, 15, rather than a grade; most U.S. students, though not all, are 10th graders, and the grade levels of students in different countries can vary, federal officials say.
While the PISA results are more discouraging, that test is arguably the most relevant standard for judging U.S. students, said Gary W. Phillips, a vice president and chief scientist at the Washington-based American Institutes for Research. TIMSS groups the United States with many developing nations with far fewer resources, he noted. PISA, by contrast, compares American students against only relatively wealthy, industrialized nations.
“What you should be doing is comparing yourself to your economic competitors,” said Mr. Phillips, who has studied the performance of U.S. states and cities internationally. “To me, the OECD”—whose members are all industrialized countries—“is a good average to be comparing yourself against.”
When it comes to gauging the ability of American students against foreign peers, he said: “It depends on your goal. We should be discouraged if our goal is to be at the top level. Being in the middle of the pack is where we show up.”
The United States participates in three major international exams, which test students of different ages in different subjects for different purposes.
TIMSS: The Trends in International Mathematics and Science Study gauges students’ math and science skills at two grade levels. Thirty-six jurisdictions took part at the 4th grade level in 2007, and 48 participated at the 8th grade level that year. Both industrialized and developing nations take part. Like the primary U.S. test, the National Assessment of Educational Progress, or NAEP, TIMSS measures students’ knowledge of school-based curriculum.
PISA: The Program for International Student Assessment tests math, science, and reading skills that students pick up in and out of school. It assesses students at a specific age, 15, rather than at a grade, and measures their ability to apply knowledge to real-world contexts. Thirty industrialized nations and 27 other jurisdictions took part in 2006.
PIRLS: The Progress in International Reading Literacy Study evaluates 4th grade reading comprehension, in both literary and informational skills. In 2007, 40 jurisdictions took part. U.S. scores were mostly unchanged from 2001, though the United States surpassed a majority of the participating nations.
Mark S. Schneider, a former commissioner of the National Center for Education Statistics, also sees reasons for U.S. policymakers to be discouraged. In a Commentary essay published in Education Week in December, he examined the “effect sizes,” or standardized statistical differences, of the gaps between the United States and top-performing countries on TIMSS and PISA. He then compared those effect-size margins with those separating high- and low-scoring states on the primary U.S.-based test, NAEP.
For instance, the distance separating the United States from high-performing countries, such as South Korea, on the PISA math exam is comparable to the one separating Mississippi and Massachusetts, the states with the lowest and highest average scores the 8th grade math NAEP, according to Mr. Schneider, now a colleague of Mr. Phillips’ at the AIR.
Others, however, say policymakers and the news media misinterpret the data and vastly overstate American students’ shortcomings on international exams.
In a recent essay, Hal Salzman, a professor of public policy at Rutgers University, in New Brunswick, N.J., noted that the United States produced a smaller percentage of high-performing science students on the 2006 PISA than countries like Finland, the United Kingdom, and Australia. Yet the United States’ population, at 307 million, dwarfs that of those countries, and so in raw numbers, it produces many more top-tier students—“the lion’s share of the world’s best”—than its so-called competitors, wrote Mr. Salzman, with co-author Lindsay Lowell.
Mr. Salzman and Mr. Lowell of Georgetown University, in Washington, also have argued that U.S. schools are, contrary to popular opinion, producing sufficient numbers of talented K-12 students to satisfy America’s economic needs in math and science. Students tend to drop out of the pipeline later, in higher education and the workforce, they say.
“The tests do not support the conclusions that are being made” about U.S. students’ lack of skill, Mr. Salzman said in an interview. “We’re producing students on par with anybody else in the world.”
As evidence, he cites the strong performance of two states, Massachusetts and Minnesota, which took part in the 2007 TIMSS and scored above international and U.S. averages in almost every math and science category. Those states have also fared well on NAEP. ("Standards Help Minn. Vie With Top Nations," Jan. 21, 2009.)
“Let’s look at Massachusetts and Minnesota before we look at Finland,” Mr. Salzman said. “One would think that there’s more transferability in what they’ve done than there is in looking at foreign countries.”
Mr. Schneider, despite his views of the gaps between U.S. and top-performing foreign students, also cautioned against using the tests to draw broad conclusions about school policies.
TIMSS and PISA, like NAEP, are relatively “blunt instruments,” the former statistics chief said. They do not produce longitudinal data, tracking the same students over time, which would be useful in pinpointing the particular policies influencing student performance.
As it now stands, international exams cannot explain whether it was a high-performing country’s math curriculum, its teacher salaries, or another factor that produced strong results. Only high-quality studies can reveal that, he said.
The tests yield “important hypotheses, which we should be testing more rigorously,” Mr. Schneider said. A dozen factors could be behind a nation’s test score, he added.
Limitations aside, many observers say U.S. policymakers can, in fact, draw important lessons from international test scores.
Several observers say they are encouraged by U.S. Secretary of Education Arne Duncan’s statements that he would like to see states benchmark their assessments against international standards. Mr. Phillips, of the American Institutes for Research, has conducted studies that compare individual U.S. states’ and cities’ test results against those of foreign nations by linking NAEP and TIMSS scores.
This is the fourth and final installment of a yearlong, occasional series examining the impact of the 1983 report A Nation at Risk.
The first installment was published on April 23, 2008, as the 25th anniversary of the report was being marked. It explored concerns about global competition and efforts by policymakers and educators to benchmark American performance against that of students in competitor nations.
The second, published September 24, 2008, looked at U.S. progress toward finding more time for children’s learning.
The third installment, published February 25, 2009, focused on charter quality and came a quarter-century after A Nation at Risk declared that a "rising tide of mediocrity" was eroding U.S. education.
Researchers can go much further in creating such state-to-nation comparisons, he argues. In fact, Mr. Phillips expects to release a study soon that assigns comparable letter grades to countries, states, and school districts through a statistical “crosswalk” between NAEP and TIMSS data.
While Mr. Salzman believes many of the worries about poor U.S. test performance are exaggerated, he sees a lesson in PISA and TIMSS data that he believes should not be glossed over. Although the United States produces sizable numbers of top-tier students, it also leaves vast numbers in the low-performing category—a poor showing that has consequences for the economy, he says.
“You need a good, middle-skilled [population of workers], or else innovations won’t work anywhere,” Mr. Salzman said. In technology, medicine, and other fields, he said, “the value of innovation depends on your ability to implement it.”
Many state leaders are taking heed of the demand for U.S. schools to be world-class. The Council of Chief State School Officers and the National Governors Association are working on developing international benchmarks of what students should know and embedding them in state standards. The project includes detailed comparisons of standards in some U.S. states with those of several Asian countries. The benchmarks could be assessed either through state participation in the international tests, or by including similar measures in state or national assessments, like NAEP, according to CCSSO Executive Director Gene Wilhoit.
“It is important to us to look at ... giving students educational opportunities of similar rigor and with similar expectations of the highest-performing nations in the world,” Mr. Wilhoit said. “To outperform other countries is not as much the goal as it is to make sure we are, in this country, providing ... what we predict are going to be the essential knowledge and skills we think these students are going to have to have in the future.”
For Massachusetts, measuring the skills of its students on TIMSS did not come cheap: The state spent $600,000 to participate, officials said.
To Mitchell D. Chester, the state commissioner of education, the value of international testing depends largely on policymakers’ willingness to probe beneath the raw scores to see what the data say about teaching, the performance of subgroups of students, and other factors.
His state, for instance, is planning a detailed analysis of its TIMSS scores, focusing on performance gaps between boys and girls, and the content of Massachusetts’ math and science courses, compared with those of foreign nations.
Although it is important to consider the major political and cultural differences between the United States and high-performing countries in weighing American students’ test performance, Mr. Chester believes it is just as important that policymakers not use those dissimilarities as excuses.
“It’s easy to say, ‘They can get away with that in Finland’?,” but not in the United States, he said. “That can be a limiting perspective. We can often be too dismissive.”
Vol. 28, Issue 29, Pages 1, 16-17Published in Print: April 22, 2009, as International Exams Yield Less-Than-Clear Lessons