It’s a spring ritual: Every year in the U.S., millions of schoolchildren take annual, standardized state tests to get a sense of how well their states, districts, schools, and even teachers are helping them learn.
Another sampling of students take the National Assessment of Educational Progress—or NAEP, better known as the Nation’s Report Card. Those results, released periodically, fill in the gaps to show how students in a particular state are performing relative to their peers.
That’s how accountability and assessment have worked in the United States at least since the advent of the No Child Left Behind Act back in 2002 and continuing with its replacement, the Every Student Succeeds Act of 2015.
And in fact, NAEP and Advanced Placement tests are prime components of Quality Counts‘s Achievement Index, which grades and ranks states in this politically fraught category.
The United States is unique among countries in subjecting students so often to standardized tests, but as testing experts note, the resulting deluge of data comes with significant trade-offs on exam quality. And despite a few innovations under ESSA, plenty of them also wonder whether the road-not-taken might have produced a more nuanced and useful, if less frequent trove of information.
Testing every student every year is a costly prospect, said Marc Tucker, the president and CEO of the National Center on Education and the Economy, a research and policy organization in Washington. Tucker’s research has focused on the policies and practices of the countries with the best education systems.
And the expense means that the tests are often lower quality than tests used in other countries, and a poor gauge of the higher-order critical thinking skills that students need in college, the workforce, and life, he added.
“We’ve made it virtually impossible to have the quality of tests that other nations that are far ahead of us are using to determine how well their own kids are doing,” Tucker said. “So what we’ve done is to deprive ourselves of tests that will enable us to measure the things that are the most important about whether or not are kids are going to be ready for what’s coming. That’s a very poor trade. A very poor trade.”
By contrast, very few of the highest-performing countries test students every year, Tucker said. And when they do test, they often use deeper assessments that include performance tasks or writing prompts, giving educators a richer understanding of what students know and are able to do.
Singapore, for example, outperforms the U.S. on international measures such as the Program for International Student Achievement, or PISA, in average reading, math, and science performance. It tests students only about three times in the course of their careers—once at the end of elementary school, once in middle school, and once in high school, Tucker said.
Focus on Equity
But there are also advantages of the American system. One huge plus: Testing every student every year gives policymakers and educators a sense of how different demographic groups are doing relative to each other, within the same school, said Randy Bennett, a research chair at the Education Testing Service or ETS, a testing company that administers the SAT and other assessments, and has assisted with NAEP.
After the passage of the NCLB law, “we could know for the first time that a very good school was not performing so well when you looked at some of its demographic groups,” he said. “If you care about equity, it’s a strength” of the American system.
But Bennett, like Tucker, acknowledged that testing students frequently can lead to lower-quality assessments.
Catch up on how the nation and states fared on a broad range of K-12 categories, including school finance, as reported in this year’s first installment of Quality Counts, published Jan. 17.
One “consequence of having to test all students is that you want to do it as efficiently [as possible], which takes you down the road of tests that don’t necessarily reflect the kind of tasks or at least the kind of breadth and depth and processing that you would like to use in teaching,” he said.
That’s been a big frustration for Matthew Blomstedt, the commissioner of education in Nebraska. He understands the need for data and wants to hold schools accountable. But he’s worried that concentrating on end-of-the-year test scores leaves a lot of other factors that could contribute to student achievement by the wayside.
“It’s not that math and reading scores don’t matter, because they do,” Blomstedt said. “But we’ve put so much focus on those things and used them as the driving force of how we’re making educational decisions in this country.”
He said he’s visited Native American tribes in Nebraska and offered up math and reading coaches, when what they really needed were counselors for their students.
Blomstedt is exploring how Nebraska can use data from interim assessments—which he said focus on a broader range of skills that better reflect what students are actually learning in class—to better inform accountability. That’s something that’s allowed under ESSA, but few states have taken advantage of it.
ESSA also gives states a chance to move beyond the kind of fill-in-the-bubble tests that have drawn sharp criticism from educators and some experts. Up to seven states can opt to participate in a pilot program allowing them to try out new kinds of tests in a handful of districts, with the goal of eventually taking the tests statewide.
But there are significant hurdles to participating, including showing that these new assessments are “comparable” to state tests. So far, only two states—Louisiana and New Hampshire—have applied for the pilot.
Previous efforts to improve the quality of state tests have had mixed results.
In 2010, the Obama administration funded the development of tests meant to provide a richer sense of student skills. The resulting PARCC and Smarter Balanced tests were still in use by 21 states during the 2016-17 school year, according to an Education Week review.
But many states have dropped those tests, particularly at the high school level, in part because of complaints from parents that they were too long or took up too much classroom time.
Picking a yardstick for student achievement from among the jumble of measures can be tricky and subjective—and can leave out factors that some would argue are crucial to a fully rounded picture of student achievement.
The Education Week Research Center, for instance, considers 4th and 8th grade NAEP scores, graduation rates, Advanced Placement scores, and gaps between students in poverty and their peers in determining which states have the highest achievement.
The index looks at both current achievement and growth over time in those measures. In crafting the index, the research center put a premium on metrics that would be continually updated and are common across all 50 states.
Those indicators may be missing some nuances, but they have “value because they provide an apples-to-apples comparison,” said Sterling Lloyd, the assistant director of the research center. And he said it “rewards states that have made progress on some important measures.” That’s especially key because it gives state policymakers a chance to evaluate their strengths and weaknesses against other states, Lloyd added.
The Achievement Index couldn’t capture high school performance, other than AP test results, because the 12th NAEP scores are not available on the state level. It also could not include things like real-world problem-solving skills, teamwork, and collaboration, because standardized tests don’t measure them very well.
And a test score can’t tell policymakers and educators everything they need to know about whether a particular student is actually improving, said Leslie Rutkowski, an associate professor of inquiry methodology at Indiana University in Bloomington who specializes in educational measurement.
“As long as we are using it as one piece of evidence in a broader profile, then fine,” she said. “But if that is the one thing that we’re using to make decisions, I think that’s very risky and prone to error. We’re just not that good at testing.”
She noted that some other countries’ assessment systems carry high stakes for students in a way that tests in the U.S. don’t.
In Germany, for example, students take a test at about age 18 to determine if they can pursue higher education or an apprenticeship program. By contrast, in the U.S. a student can perform poorly on college entrance exams such as the ACT or SAT and still get a four-year diploma, although that student’s college choices may be more limited.
To be sure, ESSA requires states to look beyond test scores and graduation rates in gauging student achievement. More than 30 states have included chronic absenteeism, or some measure of attendance, in their accountability plans. And at least 35 have included some kind of “college and career readiness” indicator.
Academic Measures Still King
Nevertheless, academic measures—including test scores, graduation rates, and English-language proficiency—are still king under the law.
In the U.S., policymakers tend to think of student achievement and school quality as one and the same. If a school shows growth in test scores, or has high scores, it is rewarded. Schools where children are underperforming—or slipping—are targeted for extra help. But there are problems with linking school quality and student achievement, Rutkowski said.
“Those conversations are conflated,” she said. “We can’t reasonably attribute the performance of the school or assign the performance of a school to an individual. You can well have high-performing students in poor schools and low-performing students in great schools.”
Other countries, such as the Netherlands, are more holistic in their approach to gauging school quality, said Daniel Koretz, a professor of education at Harvard University and the author of The Testing Charade: Pretending to Make Schools Better.
Dutch schools are allowed to choose their own standardized tests, although in practice most use the same exam, he said. But they are also subject to intensive inspections. (Some U.S. states—including Vermont—are trying to emulate that approach, as Education Week has reported.)
In the Netherlands, people “will often get test scores in the context of a [school] inspection report that has a lot more information,” Koretz said. “They don’t shy away from the fact that you need human judgement to have a complete evaluation of schools.”
A version of this article appeared in the September 05, 2018 edition of Education Week as The Testing Ritual Continues, But Is There Something Missing?