If Students Aren't Trying on International Tests, Can We Still Compare Countries' Results?
Many students seem to be blowing off a major international exam, leading some researchers to argue that the results paint a distorted picture of where countries stand in education rankings.
Worldwide, a high percentage of students either skip questions, spend insufficient time answering them, or quit the Program for International Student Assessment test early. As a result, a handful of countries fall lower in overall PISA rankings than they might if their students applied themselves, according to the provocative new study.
PISA, administered every three years to 15-year-olds all over the world, is one of the primary international tests used to get a snapshot of student performance worldwide and to facilitate policy comparisons. The new research raises fresh questions about such comparisons.
“PISA is taken as the gold standard, and countries live and die by their PISA rankings,” said Kala Krishna, a liberal arts professor of economics at Pennsylvania State University, who conducted the research with two colleagues. “The United States has been very upset because it doesn’t do so well on PISA, and that has led to a lot of concern that maybe we’re not using our money wisely.”
In response to a series of emailed questions, an official at the Organization for Economic Cooperation and Development, the Paris-based group that runs PISA, faulted what she called the study’s “sensational tone.”
Shifts in ranking would be far smaller or even insignificant if the measurement errors associated with the exam, which is based on samples of students, are taken into account, said Miyako Ikeda, a senior analyst at the OECD’s directorate for education and skills.
Examining Online Test Data
Krishna and two colleagues, S. Pelin Akyol of Bilkent University in Turkey, and Jinweng Wang, also of Penn State, conducted the study, which was released as a working paper this month by the National Bureau for Economic Research. It has not yet been peer reviewed.
For the study, the authors mined keystroke data—how long students spent on each test question and how they responded to various kinds of test questions—from the 2015 online administration of PISA, which was taken by students in more than 58 countries, including the United States. The test measures skills and knowledge including math, reading, and problem-solving, among others, but PISA officials emphasize a different topic each administration. In 2015, the emphasis was on science, and the new research is based on data from that portion of the test.
From the data supplied by the OECD, the researchers devised criteria to identify students who weren’t putting their best foot forward on the test. Among those, for example, were students whose keystrokes showed that they answered questions in such a short period of time that they couldn’t possibly have read each of them.
They also highlighted test-takers who quit the exam early or didn’t answer more questions even when plenty of exam time remained. In all, the study found that students tended to bypass harder questions or open-ended ones more often than easy questions or those given in a multiple-choice format.
Both Krishna and Ikeda noted that those measures are judgment calls, since they measure test-taking behavior as a proxy for student effort.
Certain student and contextual attributes were linked with higher incidences of goofing off, the study found. Wealthier students and lower-skilled students tended to take the exam less seriously. Testing context seemed to matter, too. Countries in which students reported sitting for more “high stakes” exams had a higher proportion of students blowing off PISA, which does not carry stakes for students.
In all, the proportion of students exhibiting what the researchers called “nonserious” behavior on the test ranged from a low of 14 percent in Korea to a high of 67 percent in Brazil. In the United States, about 23 percent of test-takers fell into that category.
A Change in Rankings?
To find out how lack of student effort affected each country’s ranking, the researchers used a statistical technique to plug in the missing answers, based on how each student likely would have performed given his or her other answers and those of similarly skilled students.
Portugal, which ranked 31st on PISA, might have jumped up to 19th or 16th place if its students had made every effort on the test, depending on which model was used. Sweden would potentially have moved up as many as 11 spots and Norway as many as nine, they found.
High rates of nonserious behavior didn’t always affect rankings. Most countries at the very top and bottom of the rankings would stay there or move only a few slots. The United States’ overall rank in science, 27th, might improve about two to five notches depending on the statistical technique used, if all students did their best, the researchers estimated.
The OECD’s Ikeda pushed back on those estimates. If all students in all countries made a serious effort on the test, then the changes in ranking would be far less dramatic than the country-by-country examples the researchers chose to highlight, she said. Many if not all of those changes wouldn’t be significant when the sampling error associated with the test is taken into account, she contended.
For that reason, she rejected the researchers’ call to publish a set of adjusted rankings alongside the regular ones.
“That is not to say that effort is not important—we agree that it significantly affects absolute performance—but it does not affect relative performance rankings,” she said.
The Downside of Low Stakes
Regardless of how the researchers’ thought experiment is interpreted, the results highlight two testing truisms.
First, comparisons of international student performance are fraught with caveats, “but ifs,” and interpretive challenges.
And second, as students age, they don’t always take “low stakes” exams like PISA seriously—unlike tests that stand to affect their grades or access to college. The United States’ major low-stakes exam is the National Assessment of Educational Progress, given as a dipstick of student learning in America in much the same way that PISA is internationally.
NAEP officials have long struggled with student-participation rates on its 12th grade exam. But Peggy Carr, the associate commissioner for the assessment division of the U.S. Department of Education’s National Center for Education Statistics, which administers NAEP, said that those students who do take the exam seem to be doing their earnest best. The percentage of students skipping questions on the 12th grade exam in 2015 was comparable to the rates for 4th and 8th graders.
Forthcoming keystroke data from the 2017 12th grade NAEP, which was given on tablets instead of bubble sheets, should provide even richer information on how students are engaging with test items, she said.
In this report for PBS NewsHour, Education Week correspondent Kavitha Cardoza asks students from other countries attending U.S. schools to compare the academic rigor and balance of school activities. How are educational priorities different in this country?