Back in March, I took to the blog to ask three questions about concerns I had regarding Common Core testing to PARCC and SBAC, the two consortia that are building and field testing the assessments. A few weeks later, representatives from PARCC and SBAC took me up on the offer, and on the whole I found the entire exercise fairly heartening (even if I’m left with additional questions).
Last week, Wayne Camara, senior vice president of ACT, emailed and offered responses from ACT as well. Given that ACT currently serves 9 states, I was greatly encouraged by its willingness to be transparent and wanted to give ACT the forum for the day. Without further ado, here’s Wayne.
Written by Wayne Camara, Senior Vice President of Research, ACT, Inc.
Let me begin my response by providing a brief description of ACT Aspire, a vertically scaled and articulated standards-based system of assessments for students in grades 3-10. Benchmarked to empirical evidence of college and career readiness, ACT Aspire includes summative assessments and optional periodic (interim and classroom) assessments across five subject areas (English, mathematics, reading, science, and writing). The assessments contain a mixture of selected-response questions, constructed-response tasks, and technology-enhanced items. The assessment framework and items are aligned with the Common Core State Standards in English language arts and mathematics and rely both on ACT’s strong longitudinal data from students who have transitioned to college or postsecondary career training programs and on extensive surveys of college and K-12 faculty to identify the skills and knowledge that are most essential for college and career readiness.
As of the end of May, ACT Aspire had been administered in a fully operational environment to more than 1.1 million students in both computer-based and paper-and-pencil modes. Operational scores for each grade and subject include a STEM score and ACT Readiness Indicators that are empirically linked to ACT’s College Readiness Benchmarks, which indicate likely readiness for credit-bearing first-year college coursework in English, mathematics, science, and the social sciences.
It is important to understand that the issues raised in your March 26 post cannot be immediately resolved in terms of ACT Aspire or any other new assessment program. Continuous research is required to ensure the highest levels of psychometric quality for tests used in high-stakes decisions. ACT is committed to ensuring these and other issues are adequately addressed in a variety of research studies. Bearing this in mind, the following are ACT’s replies to the three questions from your post:
How [does ACT Aspire] compare the results of students who take the assessment using a variety of different devices?
ACT Aspire is currently approved for delivery via computer or paper and pencil. Both versions are built to the same content specifications, including use of constructed-response tasks. The only difference in design is the absence of technology-enhanced items on the paper-and-pencil version. Instead, to ensure comparability, the paper-and-pencil version contains selected-response items that test the same constructs as the technology-enhanced items. We have also employed rigorous assessment design requirements and equating methods to provide similar score meaning across modes.
ACT and Pearson (which handles technology and delivery for ACT Aspire) conducted extensive studies examining student performance on individual items and well as score comparability in both modes. The results of these studies showed that reported scores indicate the same level of student achievement regardless of testing mode.
Nevertheless, a single set of studies may not be adequate to ensure comparability, so we continue to monitor testing modes in operational administrations. We are capturing the operating system on which a student tests and will continue to monitor metadata as well as test scores on different operating systems.
Preliminary research using other devices has also been conducted, including several small device studies and cognitive labs that examined item rendering and how students at different grades and in different subjects responded to items in various modes. Additional research is planned for the fall. ACT realizes that schools differ tremendously in terms of their technology and that there is therefore significant demand for flexibility in the choice of device, administrative mode, and administrative timeline. However, score comparability is essential to supporting the appropriate interpretation of test scores and providing the necessary forms of validation evidence. Because ACT Aspire’s summative assessment will largely be used for accountability purposes, ACT demands strong evidence of comparability across devices, modes, and settings not just for the average student, but for all groups of students. Statistical adjustments alone may not adequately compensate for significant differences across technology devices, and conditions should not be the sole basis for defending score comparability when potential differences in response times, response processes, and testing conditions can also differentially affect specific groups of learners. ACT has a long history of research on testing accommodations and validity, which is one reason why ACT’s test scores are relied on by colleges and universities for admissions, course placement, and remediation decisions. We have implemented an effective set of accommodations in our national programs and support state-specific accommodations (e.g., in statewide administrations of the ACT college readiness assessment, commonly known as “the ACT”). ACT is also committed to ensuring the security of test materials on different devices before these devices are approved for use in testing.
In the future, ACT Aspire will include additional innovative item types, once mode comparability becomes less of a concern. Finally, practice items are made available to schools and students to mitigate any concern that students would take a test on an unfamiliar device.
How [does ACT Aspire] account for vastly different testing conditions?
ACT conducted an extensive field test in 2013 that was used to test system requirements and establish testing conditions for this year’s first operational administration of ACT Aspire. We send staff to observe testing at different locations, provide “hotlines” for schools to call prior to or during testing with any issues, and regularly capture irregularities for further investigation and iterative improvements in the assessment system. We have issued more specific requirements in many areas to ensure comparability, standardization, and fairness, such as time requirements for each test and policies on the use of calculators.
ACT has also developed a number of administrative documents--including system requirement documents, administration manuals, and supervisor manuals--for participating schools to use in delivering comparable testing experiences, especially with respect to those aspects that have been shown to affect test scores most strongly, such as preventing pre-exposure to items.
How [does ACT Aspire] account for the fact that we’re apparently looking at testing windows that will stretch over four or more weeks?
Not all schools run on the same calendar: many states begin school in early or mid-August, whereas others begin after Labor Day. Some secondary schools implement a variety of block-schedule formats, while others stick to a traditional school schedule. This past winter also highlighted the demands that can be introduced into school schedules due to weather issues. School calendars are often set well in advance and many state assessments employ a testing window for paper-and-pencil testing.
In order to meet schools where they currently are, both in terms of calendar and in terms of the ability to test all of their students in multiple subjects, a testing window is warranted. However, an extended testing window--of 10 or 12 weeks, say, for schools on the same schedule--would have students at different schools testing after vastly different numbers of instructional days. Such a design would clearly raise concerns about the fairness and equivalence of scores within a district or state. Therefore, ACT Aspire uses two four-week testing windows, which provides both sufficient flexibility and much less variation in instructional time and opportunity to learn. Ultimately, however, with all state and district testing programs, it is the state or district that makes the final decision, and ACT Aspire technical staff are available to advise them in these and other issues. We used a testing window in the initial pretesting and in the spring 2013 studies, as well as in the spring 2014 operational administration. We are now examining responses from students at the beginning and the end of the testing windows to determine if issues or confounding factors emerge.