Assessment

State Test Results Are In. Are They Useless?

By Catherine Gewertz — October 21, 2021 9 min read
FILE - In this Jan. 17, 2016 file photo, a sign is seen at the entrance to a hall for a college test preparation class in Bethesda, Md. The $380 million test coaching industry is facing competition from free or low-cost alternatives in what their founders hope will make the process of applying to college more equitable. Such innovations are also raising questions about the relevance and the fairness of relying on standardized tests in admissions process.
  • Save to favorites
  • Print

Educators have been bracing for them, and now they’re here: the first state test results since COVID-19 interrupted K-12 schooling. Districts, states, and schools are poring over the data from spring 2021 tests, hoping to understand exactly how—and how badly—the pandemic affected children’s learning.

But even though educators are hungry for insight, assessment experts are urging caution. This year, more than any in recent memory, calls for extreme care and restraint when analyzing statewide test scores, drawing conclusions, and taking action, they say.

Like schooling itself, standardized testing was deeply disrupted in many ways last spring, which may have distorted the meaning and utility of the results. In some cases, state test data will be virtually useless, the experts say. In others, with thoughtful analysis, the data can yield insights that could help leaders and educators allocate resources and help children rebuild academic muscle.

Here are some key considerations—and important cautions—for state, district, and school leaders, and teachers, to bear in mind as they review state test scores.

A lot happened with state tests in 2021 that could affect the results

In 2020, the U.S. Department of Education allowed states to skip federally required assessments. In 2021, however, states had to administer those tests. But that doesn’t mean it was business as usual.

In a handful of states, some students took tests remotely, while others took them in person. Massachusetts, for instance, allowed students in grades 3-8 to take remote tests if their schools were in remote learning mode, and more than 15 percent of those students did so.

Some states made other changes to their testing regimens. A few gave shortened versions of their tests. Colorado gave its English/language arts test only in grades 3, 5, and 7 and its math test only in grades 4, 6, and 8. In California, some districts gave the Smarter Balanced test, and others used assessments of their choosing.

Many states saw fewer students take the test than usual, though, and that is the factor poised to exert the most widespread influence on the validity and comparability of state test data. According to the Center for Reinventing Public Education, which has been monitoring states’ responses to COVID-19, of the 30 states that have released test results so far, only 14 reported test-participation rates of 90 percent or more.

Some states reported participation rates as low as 10 percent (New Mexico) and 30 percent (Oregon). Participation also varied markedly within states: Colorado reported regional participation rates ranging from 51 percent to 88 percent.

A number of factors fueled low participation rates, including many parents who chose not to send their children into school buildings simply to take a test. And schools likely felt less pressure to insist that students show up for testing, since the Education Department waived its accountability rules that normally penalize schools for testing fewer than 95 percent of their students.

“There was a wide variety in the ways testing played out,” said Terra Wallin, who advised the Education Department on assessment and accountability from 2014 to 2017 and now oversees those issues for Education Trust, a civil rights advocacy group. “There are still ways states could look at general patterns [in test-score data], do a higher-level examination, to help them think about how best to use federal funding for recovery, but they need to proceed with caution.”

Ask key questions before deciding how to use the data

Experts say it’s important to ask three crucial questions about your state test data.

  • Did any of our students take the test remotely? If so, those scores shouldn’t be viewed as comparable to the scores of students who took it in person. That “mode effect” is a key tenet of assessment: Whether a student takes a test online or with paper and pencil can influence the results.
  • Did we use the same test as in 2019? If you switched tests, or changed the length or frequency of your test, a detailed expert analysis could be needed to confirm the validity of the 2021 results—were there enough questions in each strand of the academic standards, for instance, to generate a valid score?—and to establish that those results can be compared with 2019 results.
  • How many of our students—and which ones—took the test? This “participation rate,” experts say, is very important in understanding what state tests say—or can’t say—about student learning. They urge educators to dig deeper than the overall state or district participation rate and find out who took the test and who didn’t.

Imagine that an analysis shows that the students who skipped the test were disproportionately those who scored low in previous years. That would skew test results artificially high, and stalled progress might appear less severe than it actually is.

That isn’t just speculation, either. It’s likely that remote learners account for many missed tests and it became increasingly apparent during the pandemic that low-income, Black, and Latino students were far likelier to be learning remotely than other students. And emerging multistate research on state test results is finding that COVID’s impact on learning isn’t concentrated just in elementary schools, or among traditionally low-performing students, as early analyses of interim tests suggested; it’s broader, affecting students at all grades and achievement levels.

Enrollment declines, widely documented in many grades, can also play havoc with sound interpretations of test scores. Again, it’s important to understand the academic and demographic profiles of who stopped coming to school, experts say.

“If you aren’t paying attention to how the population is changing, you’re misinterpreting your scores,” said Andrew Ho, a Harvard University professor of education who focuses on assessment. He urges state leaders to perform a three-dimensional analysis of their test scores to ensure valid comparisons. This is done by separately comparing each group—the students who took tests and those who didn’t—only to groups who performed similarly in the past.

“We’ve just got to avoid a naïve analysis” of 2021 test-score data, said Derek Briggs, a University of Colorado professor who leads the National Council on Measurement in Education, whose members design and study K-12 assessments.

“The danger here is that we report 2021 scores as observed in 2021, without doing any other analysis. People want to compare them to 2019, and they’re going to interpret the difference as the effect of COVID.” But the pool of students who took the tests in 2021 changed, and that requires deeper analysis than in other years, he said.

Briggs is worried that districts and states won’t take the shifting test pool into account, and they’ll take reassurance from a falsely rosy picture. That’s a particular danger in any state or district where fewer than 90 percent of students took the test, he said. Smaller margins of missing students means less of a chance those missing scores affect overall results.

Participation rates below 50 percent would make it tough to draw any meaningful conclusions from test results, said Marianne Perie, the president of Measurement in Practice, which advises states on test design and use.

Sean Reardon, who leads a Stanford University project that analyzes the links between test scores and children’s learning opportunities, said the insight into learning offered by last spring’s test scores is very limited because of all the factors influencing the scores.

“If you had a random sample of kids [in the testing pool], then that would be fine,” he said. “But testing in 2021 wasn’t random. Kids and families chose whether they took the test. Unless you have a lot of information to support a claim of comparability, I think the default assumption for 2021 is that they’re not comparable [to 2019 test scores]. I wouldn’t draw too many conclusions based on them and I’d use a lot of caveats.”

Consider ways to get insight into motivation and learning conditions

Ellen Forte, the chief executive officer and chief scientist at edCount, which advises states and districts on testing, said educators should bear in mind that millions of students, anxiety-riddled during COVID-19, were likely less motivated to do well on tests. Given that distortion, and the fact that state tests are not designed to yield highly detailed pictures of students’ achievement, she wouldn’t want to see students’ test scores used to make instructional decisions.

“Remember, these tests were designed for accountability,” Forte said. “The unit of focus should be the school, district, or state. Not the student.”

It also would behoove educators to understand more about the conditions in which students were learning, said Scott Marion, the executive director of the Center for Assessment, a consultant to states on testing. The organization has helped several states create student surveys that asked about things like their access to livestreamed instruction and how much they’d learned compared with the previous year. Teachers were asked, among other things, whether they’d been adequately supported with good professional development during the pandemic.

In a year like 2021, “I think it’s important,” Marion said. If a child tested in 2021 under conditions similar to 2019, educators can probably make sound—and very general—inferences about whether she gained or lost ground in those two years, Marion said. But what’s missing is the “why.” Gathering other data, from surveys, teacher observations, formative strategies, and interim assessments embedded in good curriculum, can shed light on “why my kids did poorly and what I might need to do differently,” he said.

Takeaway message: Multiple sources of data are more important than ever

Most experts consulted for this story agreed that with the right kinds of analyses, states can probably glean valuable information about patterns of low achievement so they can provide appropriate supports. They urged districts to press their states for detailed information and analysis to guide similar decisions at the district level.

In the classroom, though, experts differed on the role state test data should play in guiding instructional decisions for groups or individual students. Perie of Measurement in Practice said she wouldn’t want to see scores used for high-stakes decisions like grade promotion but thinks they could help teachers create flexible groupings in math or reading or dive more deeply into strands where class scores seemed weak.

Even better, Perie and other experts said, would be to blend test-score information with a portfolio of other data from formative or diagnostic tests, reports from students’ previous teachers, and other sources. This year, “you’ve got to triangulate, leveraging other measures like you never have before,” Harvard’s Ho said.

Superintendents understand this, said Dan Domenech, the executive director of AASA, the School Superintendents Association. They know it’s “critical to ascertain how much loss has taken place so they know where to begin,” but they recognize that standardized tests, while valuable, provide only “a general overview.” Accordingly, teachers will rely heavily on quizzes and other formative strategies to understand what their students need, he said.

Related Tags:

A version of this article appeared in the November 17, 2021 edition of Education Week as State Test Results Are In. Are They Useless?

Events

Jobs Virtual Career Fair for Teachers and K-12 Staff
Find teaching jobs and other jobs in K-12 education at the EdWeek Top School Jobs virtual career fair.
Ed-Tech Policy Webinar Artificial Intelligence in Practice: Building a Roadmap for AI Use in Schools
AI in education: game-changer or classroom chaos? Join our webinar & learn how to navigate this evolving tech responsibly.
Education Webinar Developing and Executing Impactful Research Campaigns to Fuel Your Ed Marketing Strategy 
Develop impactful research campaigns to fuel your marketing. Join the EdWeek Research Center for a webinar with actionable take-aways for companies who sell to K-12 districts.

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Assessment As They Revamp Grading, Districts Try to Improve Consistency, Prevent Inflation
Districts have embraced bold changes to make grading systems more consistent, but some say they've inflated grades and sent mixed signals.
10 min read
Close crop of a teacher's hands grading a stack of papers with a red marker.
E+
Assessment Opinion What's the Best Way to Grade Students? Teachers Weigh In
There are many ways to make grading a better, more productive experience for students. Here are a few.
14 min read
Images shows colorful speech bubbles that say "Q," "&," and "A."
iStock/Getty
Assessment Spotlight Spotlight on Assessment
This Spotlight will help you evaluate effective ways to offer students feedback, learn how to improve assessments for ELs, and more.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Assessment Whitepaper
Understanding 'Through-Year' Assessment: What Everyone Should Know
This is a once-in-a-generation opportunity to reconsider our assessment systems. Discover a fresh approach with Through-Year Assessment.
Content provided by New Meridian