Corrected: An earlier version of this story misidentified the lead author of a forthcoming study on how boys and girls differently approach essay questions on tests. She is Mo Zhang of the Cognitively Based Assessment of, for, and as Learning project at the Educational Testing Service.
Large-scale standardized tests have become a staple of school accountability, but they don’t give teachers much information to improve students’ learning strategies in the classroom.
That’s changing, as researchers on some of the leading national and international assessments work to pull more data about students’ learning strategies and skills from summative tests.
“It’s one thing to know the answer to a question, but it’s another to get together information about the process a student goes through to get to that answer,” said Peggy G. Carr, the acting commissioner for the National Center for Education Statistics, which administers the National Assessment of Educational Progress, or NAEP, the congressionally mandated tests meant to take a national pulse on student achievement.
The upcoming NAEP in mathematics, reading, and writing for grades 4 and 8, for example, will get a better picture of how teachers can support learning in the classroom. NAEP joins the Program for International Student Assessment, international adult literacy tests, and other testing programs that are using new technology to squeeze more useful information out of standardized tests.
“When we rely solely on summative assessments of student performance, we’re failing to capture the full picture of the learning process,” said Erin Higgins, an education research analyst at the Institute of Education Sciences, the U.S. Department of Education’s primary research arm.
“We miss out on measuring the struggle a student has when she tries to overcome a misconception in learning science concepts, or the anxiety a student feels when he’s trying to solve a math problem—or the kind of gesturing and language that a teacher uses to draw attention to a particular thing on the board as she is teaching,” Higgins said at a research conference on the future of assessments last month at the National Center for Research on Evaluation, Standards, and Student Testing, held here at the University of California, Los Angeles. “These are all really useful sources of information, and information that up until now we really haven’t been able to measure accurately.”
What Is Process Data?
Process data—also called interaction, telemetry, clickstream or logfile data, among other names—are the traces a student leaves as he or she progresses through an assessment. In general, they fall into three categories: What a student does, in what order, and how long it takes to do it.
Within a few years, Carr and other researchers believe computer-based assessments will be designed to include eye-tracking (to show where a student is looking for information), clickstream trackers (to follow students’ use of interactive tools and multiple web pages), or keystroke programs, among other tools.
Keystroke programs capture not only what students type, but the patterns and pauses in their typing. Oneshowed students who used the delete key more often—a measure of their attempts to edit—had higher scores than students who did not delete as much.
Students’ test-taking behaviors—how often they use the backspace key, for instance, or how many keystrokes they use—are already being linked in studies to their performance on the NAEP 4th grade writing assessment.
Source: National Center for Education Statistics
A separate, forthcoming study of writing led by Mo Zhang of the Cognitively Based Assessment of, for, and as Learning research project at the Educational Testing Service, found differences in how boys and girls approached essay questions. Female students edited more than male students, both in individual word choice and cutting and pasting. Girls took fewer pauses to plan the next sentence during writing. “We suspect they plan less because they have given more thought to what they were going to say before they started,” said Randy E. Bennett, the chairman in assessment innovation at ETS.
In a separate study, also forthcoming, John Sabatini of the ETS also found that students with a history of performing poorly on reading tests did better when they had to write a summary of a reading passage before answering multiple-choice questions on the content. Eye-tracking data showed that those students spent more time reading the initial text, and “the time spent looking back at the passage [while answering multiple-choice problems] went down,” Sabatini said. “They had built a mental memory model of the text; they were much more efficient.”
Eventually, Bennett said, teachers will be able to play back the timeline of a student writing an essay question, for example, identifying whether a low-scoring student had structural problems, misunderstood the question, or was distracted by irrelevant sources of information.
Within a particular domain, like science, the process data can show whether students are engaging in complex critical thinking about different sources of information, using surface keywords, or simply guessing to solve problems. PISA is using this process data to design questions to assess students’ problem-solving skills in science.
“We want to know that during a test, students are performing the kind of cognitive actions representative of what they are studying,” Sabatini said. “If you design the assessment so there’s a pathway through the screens in a problem-solving task, it becomes a very rich source” of information for teachers.
Tests associated with the two Common Core State Standards assessment consortia are designed to be able to collect some process data—such as time spent on different items and which test tools students use—but other tests are not collecting much process data yet.
For one thing, Carr noted that complex scenario-based test items, which provide the most information about how students think, cost 2½ times as much to develop as a traditional multiple-choice problem.
“It is very difficult, very challenging to capture these student actions and interface with the system,” Carr said.
Test developers must work with information-technology experts and content theorists to put together each question. And many computer-based tests are still designed like paper-and-pencil tests, which limits the ways students can use them.
“Traditional tests blocked everything they could onto one page, for efficiency of paper,” Sabatini said. “Now we can let people navigate across pages to see what they are seeing, what they are doing across multiple pages.”
Moreover, when it comes to measuring the higher-order skills called for in states’ next-generation reading, math, and science standards, Sabatini cautioned that the same data may mean different things in different domains. “When you are looking for reading fluency, more-rapid speeds are indicative of fluency. But there are other cases—in science, in other domains—where very smart, very careful problem solvers will have very slow reading speeds.”
Educators will also need more training on how to use data culled from students’ test-taking.
John Lee, a research scientist for CRESST who works with the U.S. Navy on simulation-based assessments, cautioned at the conference that it will be easy to flood educators with too much information. “We can grab all kinds of data from these simulators, but we really want to look at meaningful indicators that matter to what people do,” he said.
Coverage of the implementation of college- and career-ready standards and the use of personalized learning is supported in part by a grant from the Bill & Melinda Gates Foundation. Education Week retains sole editorial control over the content of this coverage.
A version of this article appeared in the October 05, 2016 edition of Education Week as Hunt Is On for Clues to Students’ Test-Taking Strategies