Some test questions are likely harder to answer on tablets than on laptop and desktop computers, presenting states and districts with yet another new challenge as they move to widespread online assessments.
Analyses by test providers and other organizations have pointed to evidence of small but significant “device effects” for tests administered in some grades and subjects and on certain types of assessment items.
The results in many cases do not follow a clear pattern. And the most comprehensive studies to date—analyses of 2014-15 test results from millions of students who took the Common Core-aligned tests of the country’s two major assessment consortia—concluded that overall test results were comparable, despite some discrepancies across individual test items.
But much remains uncertain, even among testing experts. A recent analysis commissioned by the Partnership for Assessment of Readiness for College and Careers, for example, found that test-takers in Ohio—home to 14 percent of all students who took the 2014-15 PARCC exams—performed significantly worse when taking the exams on tablets. Those students’ poor showing there remain unexplained.
“Absolutely, this preliminary evidence leads to questions,” said Marianne Perie, the director of the Center for Educational Testing and Evaluation at the University of Kansas. “We’re so new into this, and we need so much more research.”
This school year marks the first in which most state-required summative assessments in U.S. elementary and middle schools were expected to be administered via technology. Over the past decade, states and district have spent billions of dollars buying digital devices, in large measure to meet state requirements around online test delivery.
To date, however, relatively little is known about how comparable state tests are when delivered on desktop computers, laptops, tablets, or Chromebooks. Each type of device has different screen sizes and ways of manipulating material (e.g., touchscreen vs. mouse) and inputting information (e.g., onscreen vs. detached keyboard), factors that could contribute to different experiences and results for students.
In an attempt to summarize research to date, the Council of Chief State School Officers (CCSSO) released this week a report titled “Score Comparability Across Computerized Assessment Delivery Devices.”
“Device effects” are a real threat, the report concludes, one of many potential challenges that state and district testing directors must wrestle with as they move away from paper-and-pencil exams.
From a practical standpoint, researchers say, the key to avoiding potential problems is to ensure that students have plenty of prior experience with whatever device they will ultimately use to take state tests.
“Devices will not be regarded as interchangeable as if they were No. 2 pencils,” the CCSSO report reads. “Rather, states will and should accept that familiarity and fluency with a particular device used to administer the assessment is a factor that impacts the ability to produce the most accurate estimate of a student’s true achievement.”
From “Mode Effects” to “Device Effects”
In February, Education Week reported that the roughly 5 million students across 11 states who took the 2014-15 PARCC exams via computer tended to score lower than those who took the exams via paper-and-pencil. The Smarter Balanced Assessment Consortium, the creator of exams given to roughly 6 million students in 18 states that year, also conducted an analysis looking for possible “mode effects,” but has not yet released the results.
In addition to looking for differences in scores between computer- and paper-based test-takers, both consortia also looked for differences in results by the type of computing device that students used.
Smarter Balanced did not provide the full results of its study. In a statement, the consortium said that its findings “indicated that virtually all the [test] items provide the same information about students’ knowledge and skills, regardless of whether they use a tablet or other eligible device.”
A PARCC report titled “Spring 2015 Digital Devices Comparability Research Study,” meanwhile, reached the same general conclusion: overall, PARCC testing is comparable on tablets and computers.
But the report’s details present a more nuanced picture.
Numerous test questions and tasks on the PARCC Algebra 1 and Geometry exams, for example, were flagged as being more difficult for students who took the tests on tablets. On the consortium’s Algebra 2 exam, meanwhile, some questions and tasks were flagged as being more difficult for students testing on a computer.
The analysis of students’ raw scores also found that in some instances, students would have likely scored slightly differently had they taken the exam on a different device. For PARCC’s end-of-year Algebra 1, Geometry, and Algebra 2 exams, for example, students who used computers would likely have scored slightly lower had they tested on tablets.
And most dramatically, the researchers found that students in Ohio who took the PARCC end-of-year and performance-based exams on tablets scored an average of 10 points and 14 points lower respectively than their peers who took the exams on computers. The researchers concluded that those results were “highly atypical” and decided to exclude all Ohio test-takers (representing 14 percent of the study’s overall sample) from their analysis.
When Ohio’s results were included, though, “extensive evidence of device effects was observed on nearly every assessment.”
PARCC officials were not able to definitively say why tablet test-takers performed so poorly in Ohio. They speculated that the results might have been skewed by one large tablet-using district where students were unusually low-performing or unfamiliar with how to use the devices.
Perie of the Center for Educational Testing and Evaluation said more data—including the full extent of the apparent device effect in Ohio—should have been presented to help practitioners draw more informed conclusions.
“Typically in research, we define our parameters before looking at the results,” Perie said. “If the decision to drop the anomalous state was made after looking at that data, that could be problematic.”
Screen Size, Touchscreen Interfaces Present Challenges
In its roundup of research to date, meanwhile, the CCSSO noted a number of studies that have found some evidence of device effects. Among them:
- A 1996 study by the Educational Testing Service and a 2005 study of the National Assessment of Educational Progress, both of which found that students taking writing exams on laptops tended to perform slightly worse than peers who used desktop computers.
- A 2013 study by ed-tech company Renaissance Learning that found that students in some grade spans who took its STAR Reading and STAR Math formative assessments via a web application (on computer) tended to perform better than those who took the exams via the company’s iPad application.
- A 2014 study by researchers at Questar Assessment that concluded that students generally experience more frustration responding to items on a
tablet interface than on laptops or desktops.
- Another PARCC study, of 2013-14 field test results in a single Massachusetts district, that found numerous signs of device effects across tablets and computers. Among them: 37 percent of the tasks on PARCC’s 4th grade math exam that year were found to be more difficult on tablets, and scores on PARCC’s 4th grade English/language arts exam were lower among tablet test-takers.
The report also looked at research on the impact of specific device features. Screen size, for example, was found to be a potential hurdle for students, especially for reading passages. Smaller screens that held less information and required students to do more scrolling led to lower scores, according to a 2003 study.
Touchscreens and on-screen keyboards (both features of many tablet devices) also appear to put students at a disadvantage on some test items. Technology-enhanced performance tasks that require precise inputs can be challenging on touchscreen devices, and students tend to write less (in response to essay prompts, for example) when using an onscreen keyboard.
Overall, said Perie, she would not go so far as to advise states and districts to not use tablets for online testing, but there are “absolutely some questions” about how students perform on tablets to which state testing directors should be paying close attention.
And CCSSO offered an extended set of recommendations for states, including pushing for clarity around the claim that officials hope to make regarding their results. There’s a big difference, for example, between saying that all students would have received exactly the same score had they used a different device, as opposed to saying that efforts were made to ensure that all students took the exam on the device that was most likely to help them get the highest score.
But ultimately, the group acknowledged, states and districts will want to be able to use test scores interchangeably, regardless of the device on which the exams were taken.
To be able to do so with confidence, they’re going to have to conduct in-depth analyses of their results in the coming years, said Scott Norton, the group’s strategic initiative director for standards, assessment, and accountability.
“Device comparability is definitely something that states should be paying attention to,” he said.
Photo: Seventh graders at Marshall Simonds Middle School in Burlington, Mass., look at a PARCC practice test to give them some familiarity with the format before field-testing in 2014 of the computer-based assessments aligned with the common core. —Gretchen Ertl for Education Week-File
- PARCC Scores Lower on Computer Exams
- Online Testing Now More Common Than Paper-Pencil, Study Finds
- NAEP Crafts Plan to Deploy Tablets for Testing
A version of this news article first appeared in the Digital Education blog.