Test Designers Seek Help of Students, One at a Time
Pondering a math problem while she swings her sneakered feet from a chair, 12-year-old Andrea Guevara is helping researchers design an assessment that will shape the learning of 19 million students.
The 8th grader, who came to the United States from Ecuador three years ago, is trying out two ways of providing English-language support on a computer-based test. First, she does a few problems that display Spanish translations of the English instructions. Then she tries a few written only in English, but with pop-up windows that open on the screen and show translations of unfamiliar words.
Three researchers watch Andrea closely. They note which words she clicks on to activate the “pop-up glossary.” They watch how she responds to the bilingual instructions. Since Andrea has been encouraged to think aloud while she’s solving the problems, researchers hear as well as see how the features of the different test items help or hinder her.
Held at a middle school here last week, the session spotlighted an important but little-known piece of the test-making process, known as cognitive labs. With their intimate scale and their dive into a student’s experience with the test, cognitive labs allow scientists to get inside students’ heads and use what they learn to craft easy-to-use questions and tasks.
Developing for Many
The Waterbury session was a tiny part of a sprawling project to design tests for the Common Core State Standards in mathematics and English/language arts. All but four states have adopted the standards. Two federally funded groups of states are working on such tests; the cognitive labs are being conducted by one of those groups, the 25-state Smarter Balanced Assessment Consortium.
Students in grades 3-8 and 11 in those states are slated to take the tests in 2015, but since the standards span grades K-12, and assessments have a potent influence on instruction, all 19 million students in SBAC states stand to be affected by the new tests.
That’s also true for the 25 million students in the other state consortium designing such tests, the Partnership for Assessment of Readiness for Colleges and Careers, or PARCC.
Smarter Balanced is working its way through 945 cognitive-lab sessions in about a dozen states nationwide. As part of SBAC’s item-development contract with the test-maker CTB/McGraw-Hill, experts from the American Institutes for Research, in Washington, are looking for feedback on 20 questions that will inform the way test items are designed.
“We want to make significant changes in the way assessment items look, and we don’t want to make those changes without actually seeing how items function when they’re put in front of kids,” said Shelbi Cole, SBAC's director of math. “This helps us know earlier in the process so we don’t develop a bunch of items that don’t measure the new standards the way we want.”
Long a tool of psychological research, cognitive labs found a place in educational assessment design in the 1980s. The qualitative information they yield about subjects’ experiences with test items complements the quantitative data produced by larger-scale pilot tests and field tests, which are more geared toward gauging items’ reliability and validity. In developing its tests, Smarter Balanced is using the cognitive labs early in the process, to influence the design of items that will be field-tested in 2014.
Many of the questions at the heart of the Smarter Balanced cognitive labs focus on the interaction of students with technology, since the group’s tests will be computer-based and computer-adaptive. Do students feel as comfortable, for instance, responding to questions on tablets as they do with pencil and paper? How do their responses on tablets differ from those on PCs?
Some questions reflect the newer types of items envisioned for the tests. How long, for instance, is long enough for a performance task, which requires more-complex, extended research, writing, or problem-solving? What kinds of instructions do students need when presented with a multiple-choice item that allows them to choose more than one answer?
Key questions explore ways to provide accommodations for students with disabilities and those learning English. Magda Chia, SBAC’s director of support for underrepresented students, said it’s crucial that students from all walks of life, at all skill levels, find the test items equally accessible.
For the cognitive labs, the group reached out via emails to school districts, mailings to churches and YMCAs, and craigslist ads to recruit a broad array of students: those from big cities and small towns, and from all points on the income scale; those who are more and less at ease with technology; students who speak English fluently and those who struggle; children with disabilities and those without. Participating in the 90-minute sessions entitles students to a $50 gift card; their parents get $10 to cover transportation costs.
On researchers’ minds for the Waterbury session were two questions: What kinds of translations work best for students still learning English, and whether students find a tablet’s on-screen keyboard as easy to use as a traditional keyboard. Andrea was chosen to help them gain insight into the first, since she is still working to master English. Her older sister, Melanny, was on board to help with the second.
Elena Saavedra, trained at the AIR to administer the cognitive lab, began each protocol with a brief orientation, telling the girls that the session wasn’t about grading their responses but about designing a test for “students from many districts” with items that work well and make sense. As Ms. Chia and AIR researcher Kristina Swamy watched, Ms. Saavedra asked the girls to think aloud as they worked.
Working on a laptop computer, Andrea tackled three math problems that displayed Spanish translations for each paragraph of English instructions. She read the problem aloud, as it was written in English, then switched to Spanish as she thought out loud while solving it, using pencil and scratch paper to do the calculations.
Then Andrea moved on to the three problems that used pop-up glossaries to translate words or concepts students might find unfamiliar. One was about a student who had to paint ceramic tiles blue and green in an art class. If Andrea had hovered her cursor over the words “tile” and the phrase “art class,” and clicked on them, she would have seen little windows open up with Spanish translations. But she didn’t click on them, even though she told Ms. Saavedra in a postlab interview that she didn’t understand the word “tiles.”
She used the glossaries more as she went along, however. In the third problem, which asked her to calculate the cost of building a sidewalk of specific dimensions, she clicked on “contractor” and saw its translation. But she also clicked on words and phrases for which the item had no pop-up glossary: “fee,” “charge,” and “Prospect Road,” the location of the fictional sidewalk project. She told Ms. Saavedra afterward that she found the item difficult and that the glossaries were “kind of” helpful. When she couldn’t understand a word in one of the problems, she said in Spanish, she tried her best to deduce its meaning from context.
Reflecting later, the research team wondered whether additional pop-ups might be needed and whether extraneous details in that item would distract some students unnecessarily.
“We could have said that a woman is building a house, and needs a sidewalk built, and we don’t really need the detail that it’s on Prospect Road,” Ms. Chia said. This same protocol will be tried on several dozen other students, though, before any conclusions are drawn.
Also, the test items used in the cognitive labs have not gone through “bias and sensitivity” reviews, a standard part of test development in which items are examined for factors that could upset or distract students, or put them at a disadvantage because of cultural, social, or other references. Had the sidewalk item been through such a review, some revisions probably would have been made, said Ms. Chia. But she added that unreviewed items were used deliberately in the cognitive lab to get additional feedback on the kinds of terms or phrases that could trip students up.
Some of what researchers gleaned from Andrea’s work with the translations and pop-up glossaries came not just from listening to her, but from observing.
The fact that she clicked on the pop-ups more as she went along, they said later, suggested that it took a little while for her to get comfortable with that option. That got them wondering about the possible need for teachers to introduce the pop-ups to students during the year, so they are familiar with them by the time they take the test. They tucked that thought away for later, once the feedback from all students in the experiment can be compiled and analyzed.
With the translation exercise completed, Ms. Saavedra turned to 15-year-old Melanny, and introduced her to the tablet computer she’d be using for a set of English/language arts questions.
For the first, Melanny used the tablet to read a short argument about whether students should be permitted to go on the Internet in their classrooms. Then she used the on-screen keyboard to write a brief counterargument in a rectangle drawn underneath the prompt. When she began to compose her response, the 10th grader laid the tablet on the table and typed with one or two fingers of her right hand, leaving the left in her lap.
The second question asked Melanny to use a mechanical keyboard connected to the tablet to write her response to another prompt. This time, she typed with both hands. For the third question, Melanny was allowed to choose whether to use the on-screen keyboard or the mechanical one. She chose the mechanical keyboard.
Interviewing her afterward, Ms. Saavedra asked which keyboard she preferred. Melanny demurred, glancing down at the table and saying she had no preference. Researchers said that it’s not uncommon to get those kinds of responses, as students try to be accommodating. But the adults know that valuable information lies underneath those answers. So Ms. Saavedra gently pressed Melanny: But if you had to choose, she said, which one would you pick?
“If I had to choose, this one,” the teenager said, quickly this time, pointing to the mechanical keyboard.
Asked why, Melanny was quick to offer several reasons. For the on-screen keyboard, “you use only two fingers,” she said, while on the mechanical version “you use all of them.” In addition, Melanny explained, when the on-screen keyboard displays on the tablet, it crowds the space provided to type the answer, making that process difficult. She also complained that the on-screen keyboard display made it necessary to keep moving the screen back and forth horizontally with her finger so she could read the paragraph-long prompt.
The research team thanked Melanny, handed her a Visa gift card, and reunited her and her sister with their parents, who were waiting in the hallway. In the coming weeks, the notes and audio recordings detailing the two sessions will be combined with the responses of more than 900 other students and analyzed for lessons about test-item design.
Vol. 32, Issue 14, Pages 1, 20-21Published in Print: December 12, 2012, as Test Designers Tap Students for Feedback