State Test Results Are In. Are They Useless?

By Catherine Gewertz — October 21, 2021 9 min read
FILE - In this Jan. 17, 2016 file photo, a sign is seen at the entrance to a hall for a college test preparation class in Bethesda, Md. The $380 million test coaching industry is facing competition from free or low-cost alternatives in what their founders hope will make the process of applying to college more equitable. Such innovations are also raising questions about the relevance and the fairness of relying on standardized tests in admissions process.
  • Save to favorites
  • Print

Educators have been bracing for them, and now they’re here: the first state test results since COVID-19 interrupted K-12 schooling. Districts, states, and schools are poring over the data from spring 2021 tests, hoping to understand exactly how—and how badly—the pandemic affected children’s learning.

But even though educators are hungry for insight, assessment experts are urging caution. This year, more than any in recent memory, calls for extreme care and restraint when analyzing statewide test scores, drawing conclusions, and taking action, they say.

Like schooling itself, standardized testing was deeply disrupted in many ways last spring, which may have distorted the meaning and utility of the results. In some cases, state test data will be virtually useless, the experts say. In others, with thoughtful analysis, the data can yield insights that could help leaders and educators allocate resources and help children rebuild academic muscle.

Here are some key considerations—and important cautions—for state, district, and school leaders, and teachers, to bear in mind as they review state test scores.

A lot happened with state tests in 2021 that could affect the results

In 2020, the U.S. Department of Education allowed states to skip federally required assessments. In 2021, however, states had to administer those tests. But that doesn’t mean it was business as usual.

In a handful of states, some students took tests remotely, while others took them in person. Massachusetts, for instance, allowed students in grades 3-8 to take remote tests if their schools were in remote learning mode, and more than 15 percent of those students did so.

Some states made other changes to their testing regimens. A few gave shortened versions of their tests. Colorado gave its English/language arts test only in grades 3, 5, and 7 and its math test only in grades 4, 6, and 8. In California, some districts gave the Smarter Balanced test, and others used assessments of their choosing.

Many states saw fewer students take the test than usual, though, and that is the factor poised to exert the most widespread influence on the validity and comparability of state test data. According to the Center for Reinventing Public Education, which has been monitoring states’ responses to COVID-19, of the 30 states that have released test results so far, only 14 reported test-participation rates of 90 percent or more.

Some states reported participation rates as low as 10 percent (New Mexico) and 30 percent (Oregon). Participation also varied markedly within states: Colorado reported regional participation rates ranging from 51 percent to 88 percent.

A number of factors fueled low participation rates, including many parents who chose not to send their children into school buildings simply to take a test. And schools likely felt less pressure to insist that students show up for testing, since the Education Department waived its accountability rules that normally penalize schools for testing fewer than 95 percent of their students.

“There was a wide variety in the ways testing played out,” said Terra Wallin, who advised the Education Department on assessment and accountability from 2014 to 2017 and now oversees those issues for Education Trust, a civil rights advocacy group. “There are still ways states could look at general patterns [in test-score data], do a higher-level examination, to help them think about how best to use federal funding for recovery, but they need to proceed with caution.”

Ask key questions before deciding how to use the data

Experts say it’s important to ask three crucial questions about your state test data.

  • Did any of our students take the test remotely? If so, those scores shouldn’t be viewed as comparable to the scores of students who took it in person. That “mode effect” is a key tenet of assessment: Whether a student takes a test online or with paper and pencil can influence the results.
  • Did we use the same test as in 2019? If you switched tests, or changed the length or frequency of your test, a detailed expert analysis could be needed to confirm the validity of the 2021 results—were there enough questions in each strand of the academic standards, for instance, to generate a valid score?—and to establish that those results can be compared with 2019 results.
  • How many of our students—and which ones—took the test? This “participation rate,” experts say, is very important in understanding what state tests say—or can’t say—about student learning. They urge educators to dig deeper than the overall state or district participation rate and find out who took the test and who didn’t.

Imagine that an analysis shows that the students who skipped the test were disproportionately those who scored low in previous years. That would skew test results artificially high, and stalled progress might appear less severe than it actually is.

That isn’t just speculation, either. It’s likely that remote learners account for many missed tests and it became increasingly apparent during the pandemic that low-income, Black, and Latino students were far likelier to be learning remotely than other students. And emerging multistate research on state test results is finding that COVID’s impact on learning isn’t concentrated just in elementary schools, or among traditionally low-performing students, as early analyses of interim tests suggested; it’s broader, affecting students at all grades and achievement levels.

Enrollment declines, widely documented in many grades, can also play havoc with sound interpretations of test scores. Again, it’s important to understand the academic and demographic profiles of who stopped coming to school, experts say.

“If you aren’t paying attention to how the population is changing, you’re misinterpreting your scores,” said Andrew Ho, a Harvard University professor of education who focuses on assessment. He urges state leaders to perform a three-dimensional analysis of their test scores to ensure valid comparisons. This is done by separately comparing each group—the students who took tests and those who didn’t—only to groups who performed similarly in the past.

“We’ve just got to avoid a naïve analysis” of 2021 test-score data, said Derek Briggs, a University of Colorado professor who leads the National Council on Measurement in Education, whose members design and study K-12 assessments.

“The danger here is that we report 2021 scores as observed in 2021, without doing any other analysis. People want to compare them to 2019, and they’re going to interpret the difference as the effect of COVID.” But the pool of students who took the tests in 2021 changed, and that requires deeper analysis than in other years, he said.

Briggs is worried that districts and states won’t take the shifting test pool into account, and they’ll take reassurance from a falsely rosy picture. That’s a particular danger in any state or district where fewer than 90 percent of students took the test, he said. Smaller margins of missing students means less of a chance those missing scores affect overall results.

Participation rates below 50 percent would make it tough to draw any meaningful conclusions from test results, said Marianne Perie, the president of Measurement in Practice, which advises states on test design and use.

Sean Reardon, who leads a Stanford University project that analyzes the links between test scores and children’s learning opportunities, said the insight into learning offered by last spring’s test scores is very limited because of all the factors influencing the scores.

“If you had a random sample of kids [in the testing pool], then that would be fine,” he said. “But testing in 2021 wasn’t random. Kids and families chose whether they took the test. Unless you have a lot of information to support a claim of comparability, I think the default assumption for 2021 is that they’re not comparable [to 2019 test scores]. I wouldn’t draw too many conclusions based on them and I’d use a lot of caveats.”

Consider ways to get insight into motivation and learning conditions

Ellen Forte, the chief executive officer and chief scientist at edCount, which advises states and districts on testing, said educators should bear in mind that millions of students, anxiety-riddled during COVID-19, were likely less motivated to do well on tests. Given that distortion, and the fact that state tests are not designed to yield highly detailed pictures of students’ achievement, she wouldn’t want to see students’ test scores used to make instructional decisions.

“Remember, these tests were designed for accountability,” Forte said. “The unit of focus should be the school, district, or state. Not the student.”

It also would behoove educators to understand more about the conditions in which students were learning, said Scott Marion, the executive director of the Center for Assessment, a consultant to states on testing. The organization has helped several states create student surveys that asked about things like their access to livestreamed instruction and how much they’d learned compared with the previous year. Teachers were asked, among other things, whether they’d been adequately supported with good professional development during the pandemic.

In a year like 2021, “I think it’s important,” Marion said. If a child tested in 2021 under conditions similar to 2019, educators can probably make sound—and very general—inferences about whether she gained or lost ground in those two years, Marion said. But what’s missing is the “why.” Gathering other data, from surveys, teacher observations, formative strategies, and interim assessments embedded in good curriculum, can shed light on “why my kids did poorly and what I might need to do differently,” he said.

Takeaway message: Multiple sources of data are more important than ever

Most experts consulted for this story agreed that with the right kinds of analyses, states can probably glean valuable information about patterns of low achievement so they can provide appropriate supports. They urged districts to press their states for detailed information and analysis to guide similar decisions at the district level.

In the classroom, though, experts differed on the role state test data should play in guiding instructional decisions for groups or individual students. Perie of Measurement in Practice said she wouldn’t want to see scores used for high-stakes decisions like grade promotion but thinks they could help teachers create flexible groupings in math or reading or dive more deeply into strands where class scores seemed weak.

Even better, Perie and other experts said, would be to blend test-score information with a portfolio of other data from formative or diagnostic tests, reports from students’ previous teachers, and other sources. This year, “you’ve got to triangulate, leveraging other measures like you never have before,” Harvard’s Ho said.

Superintendents understand this, said Dan Domenech, the executive director of AASA, the School Superintendents Association. They know it’s “critical to ascertain how much loss has taken place so they know where to begin,” but they recognize that standardized tests, while valuable, provide only “a general overview.” Accordingly, teachers will rely heavily on quizzes and other formative strategies to understand what their students need, he said.

Related Tags:

A version of this article appeared in the November 17, 2021 edition of Education Week as State Test Results Are In. Are They Useless?


This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Budget & Finance Webinar
Innovative Funding Models: A Deep Dive into Public-Private Partnerships
Discover how innovative funding models drive educational projects forward. Join us for insights into effective PPP implementation.
Content provided by Follett Learning
Budget & Finance Webinar Staffing Schools After ESSER: What School and District Leaders Need to Know
Join our newsroom for insights on investing in critical student support positions as pandemic funds expire.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Student Achievement Webinar
How can districts build sustainable tutoring models before the money runs out?
District leaders, low on funds, must decide: broad support for all or deep interventions for few? Let's discuss maximizing tutoring resources.
Content provided by Varsity Tutors for Schools

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Assessment What the Research Says AI and Other Tech Can Power Better Testing. Can Teachers Use the New Tools?
Assessment experts call for better educator supports for technology use.
3 min read
Illustration of papers and magnifying glass
iStock / Getty Images Plus
Assessment What the Research Says What Teachers Should Know About Integrating Formative Assessment With Instruction
Teachers need to understand how tests fit into their larger instructional practice, experts say.
3 min read
Students with raised hands.
E+ / Getty
Assessment AI May Be Coming for Standardized Testing
An international test may offer clues on how AI can help create better assessments.
4 min read
online test checklist 1610418898 brightspot
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Assessment Whitepaper
Design for Improvement: The Case for a New Accountability System
Assessments in more frequent intervals provide useful feedback on what students actually study. New curriculum-aligned assessments can le...
Content provided by Cognia