Can State Tests Be Useful for Instruction and Accountability? (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Rick Hess

Opinion Contributor, Education Week

Rick Hess is the director of Education Policy Studies at the American Enterprise Institute and the author of EdWeek’s Rick Hess Straight Up blog. He is the creator of the annual RHSU Edu-Scholar Rankings.

New Meridian is an assessment company that launched in 2016 with the goal of making tests more useful for educators and students. Today, it works with more than 2,500 districts in five states. Given the need for good measures of student progress and better instructional support, especially after devastating pandemic-era declines in learning, I thought it worth taking a closer look at their efforts. Today, I talk with Arthur VanderVeen, the founder and CEO of New Meridian. Before founding New Meridian, he served as the executive director of college readiness at the College Board and as the executive director of assessment (and then chief of innovation) for the New York City Department of Education. Here’s what Arthur had to say.

—Rick

Rick: So, Arthur, what is New Meridian?

Arthur: New Meridian is a new kind of assessment-design company. We started in 2016 with a mission to develop the highest-quality assessments—focused on critical thinking, deep engagement with meaningful content, and effective expression. We design assessments for grades 3-8 and high school, covering science, math, and English/language arts/literacy (ELA). We now work with over 2,500 districts in five states, plus the Bureau of Indian Education and the Department of Defense Education Activity (DoDEA) administering to millions of students each year.

Rick: What brought you to this role?

Arthur: I started New Meridian in 2016 to offer technical and operational support to the then-Partnership for Assessment of Readiness for College and Careers (PARCC) states that were transitioning away from a strict consortium model to a more flexible collaboration. As a consortium, PARCC states had to agree on the same test design and use the same test-delivery vendor, making it difficult to be responsive to local needs; in the new model, New Meridian customized the test designs to individual states’ needs, drawing from a shared bank of high-quality test items to maintain economies of scale. We also made the items available to other states through a licensing model. My desire to support states with this innovative new approach grew from my days as director of assessments for the New York City Department of Education, where I was very familiar with the original conception of the PARCC assessments, and I didn’t want the states to lose the high-quality assessments as they faced political headwinds associated with the consortium. High-quality assessments have a significant impact on classroom instructional practice. If the state assessment measures the things that matter—critical thinking, deep engagement with meaningful texts, mathematical reasoning, and effective communication—teachers will focus on developing these critical skills in the classroom and more students will have access to a quality education. So I launched New Meridian to step in and help shepherd those states toward a more flexible operating model while maintaining the same commitment to high-quality assessment.

Rick: Let’s make this simple. What assessment problem are you all trying to solve?

Arthur: We are trying to reduce overall testing time while providing greater value to those who need it most: teachers and students. There’s no question that an effective teacher using a coherent and research-based curriculum is the greatest lever for accelerating student learning. We want to design assessments that reinforce that quality teaching, not disrupt it. That is why we are developing a new system of modular mini-assessments that can be flexibly aligned to a local curriculum to inform instruction while also providing a reliable, comparable measure of students’ mastery of grade-level standards. This approach will create a single system of assessments that gives teachers actionable instructional data, enables district administrators to monitor school performance and direct resources, and meets federal accountability requirements.

Rick: What’s distinctive about your approach?

Arthur: We’re taking a classroom-up approach to developing this system. You cannot squeeze instructionally valuable information out of an end-of-year summative assessment—it’s not designed for that. And current interim assessments are designed primarily to measure growth and predict performance on the end-of-year summative. That’s fine for the district administrator, but classroom teachers can’t use that data—it’s not aligned with how concepts are taught or detailed enough to inform the next steps. We’re using new test designs and psychometric models to glean more instructional value out of our short mini-assessments. Students have an opportunity to “level up” and continue to demonstrate their mastery throughout the year. Then, we pull all that data together into a comparable, reliable measure of grade-level mastery, without the redundancy or intrusion of a big end-of-year summative test. This approach will significantly reduce overall testing time and eliminate the lack of coherence between what our local assessments are telling us and what the state test is saying.

Rick: That’s intriguing, but can we get a little more concrete about these new test designs and psychometric models? Just how does this work?

Arthur: Our test designs focus on providing information that’s usable for instructional decisions. For every mini-assessment, we ask educators and learning experts, “What information about students’ learning progress on this set of concepts or skills would help you adjust your instruction?” We identify those “attributes” of learning development and write test questions that differentiate which ones students are mastering and which they are not. This may include relevant misconceptions that can block students’ learning progress. We then use sophisticated scoring models that combine information from multiple test questions and testlets to highlight which attributes need further instruction. For example, students typically learn proportional reasoning in middle school through multiple representations, including looking at patterns in data tables, determining the slope of graphs, writing equations, and interpreting verbal descriptions. Our testlets measure students’ learning progressions through these different dimensions of proportional reasoning, while allowing flexibility in how this foundational concept for algebra readiness is taught.

Rick: How do teachers get the classroom feedback? Can you talk a bit about the infrastructure at the local, classroom level?

Arthur: We are designing innovative new reports for teachers, students, and administrators that combine the instructionally focused information with ongoing, cumulative progress toward end-of-year standards mastery. Teachers use the diagnostic information to inform instructional decisions while they and their students monitor progress toward their end-of-year learning goals.

Rick: I know you all are currently piloting a few programs. Could you share a bit about those?

Arthur: We have partnered with two mission-driven, forward-thinking state education leaders—Superintendents Cade Brumley in Louisiana and Elsie Arntzen in Montana—who are challenging the status quo on behalf of their students. Both leaders are working to make assessments more accessible, more relevant, and more equitable by adopting a through-year model and aligning assessments more closely to the taught curriculum. This is our first pilot year, and it’s been really exciting. We convened teachers from both states together to write test items and we’ve been conducting empathy interviews, focus groups, and surveys to better understand what teachers, students, and families want in next-generation assessments. We’ve had strong philanthropic support to launch these pilots, and both states were also awarded Competitive Grants for State Assessment to fund a multiyear development program.

Rick: What kind of evidence is there regarding the efficacy of your assessments? What are you learning?

Arthur: We have a robust research program in place to validate both the instructional utility of our classroom reporting and the technical quality of the summative scores we will report for accountability purposes. It is critical that we do both well to achieve our goal of transforming state assessments. This year, we are piloting the test questions and blueprints and getting feedback on the design and usability of the system. We are analyzing the student test data to validate and refine our scoring models. For example, we are analyzing early student data to determine whether our scoring models can reliably differentiate the dimensions of proportional reasoning I mentioned earlier. As we get more data across larger populations of students, we will continue to refine our scoring models to support the instructional decisions teachers are making. This is a multiyear process, and we are excited to have state partners, technical advisers, researchers, and philanthropic support who are all committed to this journey. It’s critical because teachers and students need better classroom assessments that reinforce the curriculum and replace the end-of-year test, reducing overall testing time. This is our vision, and we are excited to be working with numerous partners who are also committed to this ambitious goal.

The opinions expressed in Rick Hess Straight Up are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.