Some test questions are likely harder to answer on tablets than on laptop and desktop computers, presenting states and districts with a new challenge as they move to widespread online assessments.
Analyses by test providers and other organizations have pointed to evidence of small but significant “device effects” for tests administered in some grades and subjects and on certain types of assessment items.
The results in many cases do not follow a clear pattern. And the most comprehensive studies to date—analyses of 2014-15 test results from millions of students who took the tests aligned to the Common Core State Standards designed by two major assessment consortia—concluded that overall test results were comparable, despite some discrepancies across individual test items.
But much remains uncertain, even among testing experts. A recent analysis commissioned by the Partnership for Assessment of Readiness for College and Careers, for example, found that test-takers in Ohio—home to 14 percent of all students who took the 2014-15 PARCC exams—performed significantly worse when taking the exams on tablets. Those students’ poor showing remains unexplained.
“Absolutely, this preliminary evidence leads to questions,” said Marianne Perie, the director of the Center for Educational Testing and Evaluation at the University of Kansas. “We’re so new into this and we need so much more research.”
In its June report titled “Score Comparability Across Computerized Assessment Delivery Devices,” the Council of Chief State School Officers offered four recommendations for states:
1. Identify the comparability concerns being addressed
From different devices to multiple test formats, there are a variety of factors that can make student test scores not directly comparable to each other. In order to minimize potential threats, state officials first need to be clear what they’re dealing with.
2. Determine the desired level of comparability
For most states, this will mean “interchangeability,” in which test scores are reported without regard to the device a student used.
3. Clearly convey the comparability claim or question
In the contemporary testing environment, states may be wise to embrace some level of flexibility, by claiming, for example, only that students took tests on the devices most likely to produce accurate results, rather than claiming that students would have received the exact same score, no matter which device they used.
4. Focus on the device
When administering tests on different devices, it’s important to ensure that all devices meet recommended technical specifications, and that students are familiar with the device they will be using.
Source: Council of Chief State School Officers
The 2015-16 school year marked the first in which most state-required summative assessments in elementary and middle schools were expected to be given via technology. Over the past decade, states and districts have spent billions of dollars buying digital devices, in large measure to meet state requirements around online test delivery.
To date, however, relatively little is known about how comparable state tests are when delivered on desktop computers, laptops, tablets, or Chromebooks. Each type of device has different screen sizes and ways of manipulating material—touchscreen vs. mouse, for example—and inputting information—say onscreen vs. detached keyboard—factors that could contribute to different experiences and results for students.
In an attempt to summarize research to date, the Council of Chief State School Officers released last month a report titled “Score Comparability Across Computerized Assessment Delivery Devices.”
“Device effects” are a real threat to test-score comparability, the report concludes, one of many potential challenges that state and district testing directors must wrestle with as they move away from paper-and-pencil exams.
From a practical standpoint, researchers say, the key to avoiding potential problems is to ensure that students have plenty of prior experience with whatever device they will ultimately use to take state tests.
Struggles in Ohio
In February, Education Week reported that the roughly 5 million students across 11 states who took the 2014-15 PARCC exams via computer tended to score lower than those who took the exams via paper and pencil. The Smarter Balanced Assessment Consortium, the creator of exams given to roughly 6 million students in 18 states that year, also conducted an analysis looking for possible “mode effects.”
In addition to looking for differences in scores between computer- and paper-based test-takers, both consortia also looked for differences in results by the type of computing device that students used.
Smarter Balanced has not yet released the full results of its study. In a statement, the consortium said that its findings “indicated that virtually all the [test] items provide the same information about students’ knowledge and skills, regardless of whether they use a tablet or other device.”
A PARCC report titled “Spring 2015 Digital Devices Comparability Research Study,” meanwhile, reached the same general conclusion: Overall, PARCC testing is comparable on tablets and computers.
But the report’s details present a more nuanced picture.
Numerous test questions and tasks on the PARCC Algebra 1 and geometry exams, for example, were flagged as being more difficult for students who took the tests on tablets. On the consortium’s Algebra 2 exam, some questions and tasks were flagged as being more difficult for students taking it on a computer.
The analysis of students’ raw scores also found that in some instances students would have likely scored slightly differently had they taken the exam on a different device. For PARCC’s end-of-year Algebra 1, geometry, and Algebra 2 exams, for example, students who used computers would likely have scored slightly lower had they been tested on tablets.
And most dramatically, the researchers found that students in Ohio who took the PARCC end-of-year and performance-based exams on tablets scored an average of 10 points and 14 points lower, respectively, than their peers who took the exams on laptops or desktop computers. The researchers concluded that those results were “highly atypical” and decided to exclude all Ohio test-takers (representing 14 percent of the study’s overall sample) from their analysis.
When Ohio’s results were included, though, “extensive evidence of device effects was observed on nearly every assessment.”
PARCC officials were not able to definitively say why tablet test-takers performed so poorly in Ohio. They speculated that the results might have been skewed by one large tablet-using district in which students were unusually low-performing or unfamiliar with how to use the devices.
Perie of the Center for Educational Testing and Evaluation said more data—including the full extent of the apparent device effect in Ohio—should have been presented to help practitioners draw more informed conclusions.
“Typically in research, we define our parameters before looking at the results,” Perie said. “If the decision to drop the anomalous state was made after looking at that data, that could be problematic.”
Screen Size, Touchscreen
In its roundup of research to date, meanwhile, the CCSSO noted a number of studies that have found some evidence of device effects. Among the findings: some evidence that students taking writing exams on laptops tend to perform slightly worse than their peers who used desktop computers, and signs that students generally experience more frustration responding to items on tablet interfaces than on laptops or desktops.
The report also examines research on the impact of specific device features. Screen size, for example, was found to be a potential hurdle for students, especially for reading passages. Smaller screens that held less information and required students to do more scrolling led to lower scores, according to a 2003 study.
Touchscreens and on-screen keyboards, both features of many tablet devices, also appear to put students at a disadvantage on some test items. Technology-enhanced performance tasks that require precise inputs can be challenging on touchscreen devices, and students tend to write less—in response to essay prompts, for example—when using an onscreen keyboard.
Overall, said Perie, she would not go so far as to advise states and districts to avoid using tablets for online testing, but there are “absolutely some questions” about how students perform on tablets.
The CCSSO, meanwhile, offered an extended set of recommendations for states.
Ultimately, the group said, states and districts will want to be able to use test scores interchangeably, regardless of the device on which the exams are taken.
To be able to do so with confidence, they’re going to have to conduct in-depth analyses of their results in the coming years, said Scott Norton, the group’s strategic-initiative director for standards, assessment, and accountability. “Device comparability,” he said, “is definitely something that states should be paying attention to.”
Coverage of the implementation of college- and career-ready standards and the use of personalized learning is supported in part by a grant from the Bill & Melinda Gates Foundation. Education Week retains sole editorial control over the content of this coverage.
A version of this article appeared in the July 20, 2016 edition of Education Week as Digital Device Choice Has Noticeable Impact on Test Performance