Adaptive Testing Evolves to Assess Common-Core Skills
The goal is to build better measures of student skills and knowledge
When Delaware switched to computer-adaptive testing for its state assessments three years ago, officials found the results were available more quickly, the amount of time students spent taking tests decreased, and the tests provided more reliable information about what students knew—especially those at the very low and high ends of the spectrum.
But the path to launching those tests involved a significant education of students, parents, and teachers, a sizeable technology investment by the state, and the development of hundreds of test items for every exam.
As many states move to put in place online testing tied to the Common Core State Standards in 2014-15, at least 20 states have indicated they plan to use new computer-adaptive versions of the tests, and they’re looking at states like Delaware to learn some lessons.
“Adaptive testing is really beneficial and can pinpoint a student’s learning level more closely,” says Gerri Marshall, the supervisor of research and evaluation for the 15,000-student Red Clay Consolidated School District in Wilmington, Del., which piloted such tests.
Nationally, two coalitions have received federal funding to develop English/language arts and mathematics tests for the common standards. Both coalitions—the Smarter Balanced Assessment Consortium and the Partnership for Assessment of Readiness for College and Careers, or PARCC—have said their assessments will feature high-tech, interactive questions that incorporate video and graphics and are designed both to identify what students know and to be more engaging.
Both assessments will be given online, but Smarter Balanced will use adaptive testing, while PARCC will use what are known as fixed-form tests, which feature set questions that generally do not change.
Only a handful of states—including Delaware, Hawaii, and Oregon—are now using adaptive testing on a widespread basis. Even supporters acknowledge challenges to its implementation and use, considering that many school districts are currently doing little, if any, testing online.
“It’s a big philosophical shift for people,” says John Jesse, the director of assessment and accountability for the Utah department of education, which is in the process of developing its own computer-adaptive tests for the common core. “If your district is still using paper, shifting to online is big, and then shifting to adaptive testing might be too much of a move all at once.”
Seeking Greater Precision
So what exactly is the difference between a traditional test, which presents a student with a set number of test items that don’t change during test-taking, and adaptive testing?
Testing experts say that traditional, or fixed-form, exams work well with the majority of students, who hover around the level the assessment is seeking to evaluate. Test questions are developed to appeal to most students and can assess how much those students know.
However, students at the farther ends of the spectrum—high achievers and struggling students—fare worse on those types of tests in terms of allowing teachers to identify exactly what material those students have or have not mastered.
With exceptional students, a fixed test can’t determine just how extensive their knowledge may be, and for struggling learners, it can’t determine how far behind they may be. A teacher won’t know exactly how far gaps in students’ learning on certain concepts go because the test questions don’t move far in that direction.
“The range of proficiency among kids in a grade is huge,” says Jon Cohen, the executive vice president and director of assessment for the Washington-based American Institutes for Research, which is already delivering statewide adaptive tests in several states and has been selected by the Smarter Balanced consortium to do pilot and field testing and to create the adaptive-test algorithm.
“With a typical test, a kid who is struggling is not going to see many items they can get right, and a kid at the top is not going to see many items they’ll get wrong,” he says. “Kids on the ends get a less precise score.”
Adaptive tests operate from a large test-item bank. For example, for a 40-question test, an adaptive test bank might contain 800 items, Cohen says.
An algorithm guides the computer as it picks questions based on the answer given to previous questions to pinpoint a student’s skill and knowledge level. Typically, a student will get about half the questions offered by the computer correct, whether he or she is a high, middle, or low performer, since the questions are tailored for that student’s particular level.
“With a computer-adaptive test, the percent correct is no longer relevant,” says Tony Alpert, the chief operating officer for Smarter Balanced. “The adaptive test is always challenging for every student, and we need to help people understand that.”
Computer-adaptive assessments aren’t scored on the basis of how many right or wrong answers a student gets. A student’s score depends both on the number of items he or she got right and the difficulty of the items presented. Early trials, or field tests, present items to representative samples of students to evaluate the difficulty of each item in the pool and to translate that into values that will provide a score, Cohen says.
Personalization Improves Security
The biggest advantage to a computer-adaptive test, experts say, is the ability to evaluate all students at their own levels. Because of that, students often report that they are more engaged with the test and find it more interesting, says Dirk P. Mattson, the executive director of K-12 assessment for the Educational Testing Service, who is based in the nonprofit testing company’s San Antonio office. ETS, which has been hired by Smarter Balanced to develop several aspects of the computer-adaptive test, also produces the GRE, an adaptive graduate school admissions test.
“There’s a belief that this provides a more rewarding testing experience for the test-taker,” Mattson says. “A struggling student doesn’t need to be beaten over the head encountering lots of questions they can’t handle, … and the student who is strong might welcome an additional challenge.”
In addition, because each test for each student is personalized and there are so many test questions in the bank, security risks are lessened, says Doug Kosty, the assistant superintendent for assessment and information services for the Oregon department of education. His state has used computer-adaptive testing for nine years.
It’s unlikely that students sitting near each other would encounter the same test questions in the same order, for example. A student “can’t go out on the playground and compare notes on question 14,” Kosty says. “Kids are basically guaranteed not to have the same test.”
Some educators who have used adaptive testing say the test window is shorter since students don’t always have to answer as many questions. In Delaware, students used to spend multiple hours taking state reading and math tests, says Michael Stetter, the director of accountability and resources for the Delaware department of education. The computer-adaptive tests shrank that time to one hour for reading and one hour for math, he says, making it easier for schools to schedule test times around computer labs. “We’re getting a more precise estimate of ability with the same or fewer questions,” Stetter says.
However, Smarter Balanced’s tests are expected to take 10 to 13 hours, depending on grade levels. Because of concerns from states, the coalition is now developing a shorter version it says will produce comparable results.
In addition, users of computer-adaptive testing laud the immediacy of the assessment results, which typically are posted when a student finishes the test, giving teachers the opportunity to adjust their instruction more quickly based on the results. Officials from both coalitions say some results will be available almost immediately or within days, while results from sections that contain more writing and constructed response may take several weeks.
But in the field, implementation of computer-adaptive tests can pose problems. Much like the PARCC tests, the Smarter Balanced tests will be given online, and that means schools will have to have enough devices and bandwidth. Delaware had to allocate funds to buy additional servers for districts and the state distributed 10,000 netbooks to get schools ready; the state also had to redesign training for teachers who were going to be test administrators. Districts are raising concern about lengthy testing windows tapping out their bandwidth for long periods of time and having enough devices with the right specifications to run the test.
Computer-adaptive tests can also be costly to develop since so many test items are needed. “The early years of computer-adaptive testing are extremely expensive,” Stetter says. However, since his state’s development of initial computer-adaptive tests, costs have dropped, he says, as test banks can be used for a long time.
Smarter Balanced estimates that once its adaptive tests are fully developed, its test bank will contain at least 30,000 items across all grades.
“Once you have an adaptive-testing pool, you can continue to run it for a long period of time, so there are a lot of efficiencies gained,” says Walter “Denny” Way, the senior vice president of psychometric and research services for the education publisher Pearson, based in London.
Smarter Balanced received $160 million and PARCC received $170 million in federal grants to develop the common assessments. Once the tests are ready, states will be expected to pay for them, but just how much and how those payments will be structured is still being worked out.
‘Two Viable Solutions’
Experts seem to agree that computer-adaptive testing works well with multiple-choice questions, or one-word-response questions, but there are differing opinions about how it does with longer answers or with essays. That makes computer-adaptive testing more suited to some subjects than others. Oregon, for example, uses a writing assessment that is not adaptive that takes students three hours to complete.
“Things that are essays or that contain more complicated projects that need to be evaluated through human judgment really can’t be administered [through computer-adaptive testing] in this situation,” says John Mazzeo, the vice president of research and development for ETS, based in Princeton, N.J.
Despite those limitations, Alpert says the English/language arts and literacy component of the Smarter Balanced assessments will be adaptive. The only exceptions will be a handful of performance tasks, which may be longer activities that take place in the classroom or offline.
As states and schools get ready to address the challenges of adaptive testing, training students, educators, and even parents becomes increasingly important, says Steve Slater, the lead psychometrician for the Oregon education department.
Melissa Fincher, the associate superintendent for assessment and accountability in Georgia, a state that has joined the PARCC coalition, says she appreciates the fact that the federal government has financed the work of both coalitions.
“The jury is still out, and I see this as an opportunity to look large-scale at the best way to assess students,” she says. “I don’t see this as an either-or situation. I’m pleased we have two viable solutions in the works.”
Vol. 06, Issue 01, Pages 12-16
Get more stories and free e-newsletters!