Reading the TIMSS Results
Why the Good News May Not Be So Good
Today, the National Center for Education Statistics released the results of the 2007 Trends in International Mathematics and Science Study. TIMSS is a highly regarded international assessment, and 2007 was the fourth time it has been given. TIMSS assesses the performance of students in 4th and 8th grades and provides an international benchmark against which we can judge the performance of our students compared with that of students in 35 other countries at grade 4, and 47 other countries at grade 8.
A superficial reading of this report could lead the reader to believe that the United States is doing well in science and math. But this would be a mistake—the United States is doing far worse internationally than TIMSS indicates. I will focus on the mathematics assessment to illustrate why.
The Apparent Good News
Both 4th and 8th graders in the United States scored above the international average set in TIMSS—our 4th grade students achieved a TIMSS average score of 529 points, and our 8th graders scored 508, compared with an international average that is set at 500 for each grade. The U.S. scores were higher than the average scores in around two-thirds of the countries in the 4th grade math assessment and more than three-quarters of the countries at the 8th grade. In addition, U.S. students’ scores had increased significantly since 1995, when TIMSS was first given.
TIMSS also classifies students into four levels of achievement: advanced, high, intermediate, and low, reflecting different levels of mathematical skills and knowledge. Again, at first glance, the United States looks good. At the 4th grade, 10 percent of our students performed at the advanced level, which was twice the international median. At the 8th grade, 6 percent of our students were in the highest category, three times the international median.
To summarize the good news: U.S. students scored higher on average than most nations’ students, our scores are increasing, and we have two to three times as many students doing math at the advanced level than the international median. But there’s a skunk at the garden party (in fact, there are several).
TIMSS is an international comparison, and while many countries participate in the assessment, the list of participants differs considerably from the countries that took part in the other leading international assessment, the Program for International Student Assessment, or PISA. PISA is an assessment of 15-year-old students sponsored by the Organization for Economic Cooperation and Development. The 30 countries that make up the OECD represent the largest and most advanced economies in the world, who better represent our trading partners and our competitors than does the list of countries in TIMSS. Only about half the OECD countries took part in the 4th grade TIMSS, and only about a third of them participated in the 8th grade assessment.
The OECD does allow “partner countries” to participate in PISA, and indeed the number of partner countries is almost equal to the 30 OECD countries, but the organization calculates a PISA test-score average based only on the 30 member countries.
In contrast, the TIMSS international average is based on all participating countries, which includes a dozen or so OECD countries, plus some high-performing non-OECD jurisdictions such as Chinese Taipei, Singapore, and Hong Kong SAR. But the TIMSS average also includes many less-developed countries, such as Jordan, Romania, Morocco, and South Africa.1 Including these low-performing countries in the calculation of the international average drives down that average, improving the relative performance of our students.
We can compare TIMSS with the last PISA, given in 2006, to see this. Remember that PISA is an assessment of 15-year-olds, so the closest comparison is with the 8th grade TIMSS.
Recall that our 8th grade students scored 508 in the math assessment, which was significantly higher than the international TIMSS average of 500. However, we were 24 points below the OECD average math score in PISA. Further, if we look at the highest-performing students in the United States compared with international averages, in TIMSS, the United States looks pretty good, with three times the percentage of 8th graders in the top 10 percent compared to the international median. But if we shift to PISA, we again find the United States lagging behind our OECD peers. In 2006, the cut point to distinguish the top 10 percent of U.S. students was 593, which was lower than the OECD average of 615. More striking, 23 of the 30 OECD countries had higher cut scores for their 90th percentile compared to the United States. Moreover, only 1.3 percent of U.S. students were in the highest proficiency level in 2006 PISA math—this was half the OECD average, and in the same range as Greece, Mexico, Portugal, and Turkey.
These points suggest a weakness in TIMSS that should act as a corrective to the relatively positive numbers reported today. Despite this flaw, TIMSS provides lots of good information on student performance. Unfortunately, much of that information should raise alarms. The new TIMSS report presents data on the relative performance of black, Hispanic, and white students and on the difference in performance among students in high-poverty vs. low-poverty schools. While there are many sources of data that can do such comparisons within the United States, TIMSS provides international data that can help us understand our internal challenges. In the accompanying charts (Figures 1 and 2), I repeat some of the information that appears in the report but add data from the National Assessment of Educational Progress and from PISA.
As in today’s TIMSS report, the differences shown are reported as “effect sizes.” While this term is likely to be unfamiliar to most readers, one way to think about effect sizes is that they provide a measure of “standardized differences” that makes comparison across different assessments possible. TIMSS, PISA, and the National Assessment of Educational Progress are all on different scales, so we need a way of expressing differences in a common metric—and effect size does just that. While there is no hard standard to say if an effect size is big or small, a common rule of thumb is that an effect size of 0.2 is indicative of a small effect, 0.5 a medium one, and 0.8 a large effect.
Turning first to Figure 1, we see that the difference between the United States and Hong Kong, the highest-performing jurisdiction in the 4th grade TIMSS math test, is quite large—an effect size of 1.1. To further gauge the size of this difference, the next bar in the figure shows the standardized differences between Massachusetts, the highest-performing state in NAEP, and the nation’s lowest-performing state, Mississippi. Here we see that the distance the United States is lagging behind Hong Kong is even larger than the distance separating these two American states.
Using this standardized effect-size measure, we can also get a better sense of another internal challenge facing the United States. The difference separating the average performance of black and white students on TIMSS is almost the same size as the distance between the United States and Hong Kong, and the difference between students in the most-affluent American schools vs. those in the least-affluent is almost 50 percent larger.
Figure 2 reports differences at grade 8 and adds the difference between the United States and South Korea, the highest-performing country in math in the 2006 administration of PISA. The figure again shows a large gap between the United States and the highest-performing nation on both TIMSS and PISA. Using these international differences as a benchmark again shows us the magnitude of some of the internal challenges we face. For example, the difference between Massachusetts and Mississippi is larger than the difference between the United States and Korea on PISA, and only about 10 percent smaller than the difference between the United States and Chinese Taipei on TIMSS. The gaps between black and white students in the United States and between students in high- and low-poverty schools are larger than the international gaps, and the gap between the performance of white and Hispanic students is the same size as the gap between the United States and Korea in PISA.
Challenges at Home and Abroad
International student assessments, such as TIMSS, provide an opportunity for the United States to compare itself against other countries. We do better in TIMSS than we do on PISA, but this is a function of the countries that participate in each, and we should not let the relatively good TIMSS results lull us into a false sense of complacency.
Even in the relatively easier playing field of TIMSS, we are lagging far too many countries in overall math performance and in the performance of our best students. While we have an intuitive sense of how far Mississippi lags Massachusetts, the data we present here show that the United States is lagging the top-performing countries in TIMSS even further than Mississippi is lagging Massachusetts. And when we look at the performance of different student groups within the United States, we get a picture of how hard it will be to close achievement gaps—which are about the same size or even bigger than the gap between the United States and the top-performing countries in TIMSS.
1To provide comparisons between the 2007 results and prior results, the scores of students who participated in 2007 are scaled to be comparable with scores in prior administrations of TIMSS. The international average was established based on the 1995 TIMSS, which even then had a large number of low-performing less-developed countries participating. If more low-performing countries continue to join TIMSS, the average for that year’s test will fall below 500 (indeed, in 2007, the average across all countries, unweighted by population, for 4th grade math was 473 and for 8th grade math 452). Return to Commentary.
Vol. 28, Issue 15