A few weeks ago, I asked three questions about how confident we should be that the results of the new, quasi-national, computer-assisted Common Core tests will be valid and reliable enough to support stuff like teacher evaluation and school accountability. These are questions that I’d been publicly asking for several years with little result. I’m pleased to report that, in the last couple days, I’ve received serious responses from thoughtful executives at PARCC and SBAC. Today I’ll be publishing a response from PARCC’s Jeff Nellhaus and tomorrow I’ll publish one from SBAC’s Joe Willhoft. As I’ll discuss briefly on Thursday, the responses don’t fully satisfy me--I’m eager to ask follow-up questions and seek a few clarifications. But my primary aim was a more transparent discussion about how the Common Core effort is supposed to play out. As I’ve often said, we know vastly less than we should on this score. For that reason, I want to extend my appreciation to Jeff and Joe for their constructive, reasoned responses. Here’s what Jeff Nellhaus, PARCC’s director of policy, research, and design, had to say:
By Jeff Nellhaus
In his March 26 blog post, “Three Practical Questions About PARCC & SBAC Testing,” Rick Hess raises legitimate questions about how variation in the testing conditions will impact the validity and reliability of the results of PARCC and Smarter Balanced assessments. He asks how the PARCC and Smarter Balanced states can assure educators and policymakers that test results are valid and comparable given the variety of testing devices and testing conditions, and the long testing window.
As Director of Policy, Research and Design for PARCC, I take these questions seriously. The assessments have been designed to provide teachers, students, and parents important information about where students are on the road to academic success and readiness for college and careers, and provide information teachers can use to focus (or readjust) instruction to support student learning. But they will also be used in states to determine the extent to which schools and districts are improving student performance, promotion and high school graduation, and educator effectiveness. Accordingly, the results of the assessments will need to meet a high bar for fairness and technical rigor.
Devices
Regardless of the mode of administration (paper vs. device) or the type of device students use (desktop vs. laptop vs. tablet), the content and constructs the tests are designed to measure are, with a few exceptions, the same. For example, a grade 4 computer-based math test will cover the same Common Core standards and include the same number of questions as the paper version of the assessment. Many of the questions will be very similar across the two assessments, and the results may, in fact, be comparable.
However, I agree with Mr. Hess that PARCC states must know what impact, if any, varying conditions have on test results. That is why PARCC has done studies on device compatibility. In a summer 2013 study, students were asked to interact with a variety of PARCC items on tablets. They were asked to explain what they were seeing and understanding, and how they were approaching the solution for each item. In the current field tests, we are evaluating the effectiveness of changes made based on this 2013 study (e.g., e.g., changes were made so that students could more easily identify which points on a graph they had “activated” and worked with on the touch screen). We also will investigate whether student performance differs between groups of students taking the field tests on desktop computers and tablet devices. Another study will do the same for computer vs. paper. More studies are planned for the first full administration of the assessments in the spring 2015.
If no significant differences are observed, the results of all forms of the test will be reported on the same scale. If the research shows differences in performance that can be attributed solely to the mode of administration or device students are using, PARCC will need to consider alternatives for reporting the results. One alternative could be to report the results using different scales and establish concordance tables between these scales, much the same as used to compare results of the SAT and ACT. State leadership from PARCC states are actively discussing these alternatives and others with the PARCC Technical Advisory Committee, which is made up of 14 national experts in tests and measurement.
Testing conditions
No two classrooms or sets of conditions are identical, but effective testing calls for best efforts to minimize differences in testing conditions, as Mr. Hess rightly explains. However, contrary to his assertion, students will take the PARCC tests in their own schools, in familiar conditions. The vast majority of participating schools have the technology they need to test students in their own classrooms or computer labs. For the few that do not, paper-and-pencil versions are still an option. Schools are responsible for creating testing conditions that are consistent with PARCC’s protocol for standardized conditions.
Testing window
A final question raised by Mr. Hess--the duration of testing windows--touches on another aspect of comparability. The PARCC summative assessments include two components, the Performance-Based Assessment and End-of-Year Assessment, respectively designed to be administered after approximately 75 and 90 percent of instruction during the school year. (The Performance-Based component is given earlier to allow time for scoring the robust writing and extended math problems before the end of the school year.) Because schools in some states start as early as early August and in others as much as a month later, it is necessary to provide a longer window for the testing. This actually supports comparability, because it allows students in all states to test after roughly the same length of instructional time.
Mr. Hess has raised the right questions--questions that we have been asking, researching, and working to answer for over a year. When results are released from the first PARCC assessments in September 2015, teachers, researchers, policymakers, parents, and others should be assured that the results for all students will be valid, reliable, and useful for improving instruction and for use in various accountability systems.