Opinion Blog

Classroom Q&A

With Larry Ferlazzo

In this EdWeek blog, an experiment in knowledge-gathering, Ferlazzo will address readers’ questions on classroom management, ELL instruction, lesson planning, and other issues facing teachers. Send your questions to lferlazzo@epe.org. Read more from this blog.

Assessment Opinion

We Need to Stop Overrelying on Student Test Scores

By Larry Ferlazzo — April 28, 2026 6 min read
Conceptual illustration of classroom conversations and fragmented education elements coming together to form a cohesive picture of a book of classroom knowledge.
  • Save to favorites
  • Print

What educator hasn’t sometimes felt frustrated at the overreliance on test scores to measure the success of their students and their teaching?

Today’s guest post explores alternatives to that approach.

‘Expanding Rigor’

James Soland is an associate professor of research, statistics, and evaluation at the University of Virginia School of Education and Human Development whose work focuses on assessment (psychometrics), evaluation, and data use:

Educators know the drill: A new program rolls out, someone gathers test scores before and after, and they are told whether it “worked.” But what does that really tell us? Does it help improve teaching? Does it help educators and policymakers understand why it worked here but not there? Does it help us understand what students actually gained and experienced?

As I discuss in a recent blog at the Brookings Institution, right now, the field of education evaluation is fixated on one question—“what works?”—narrowly defined as whether a program causes measurable improvements in things like test scores. That is certainly valuable at times, but it misses too much of what matters in schools. It privileges what we can easily measure over what we ought to understand. And it treats school contexts as interchangeable backdrops rather than vital elements of success.

As teachers and school leaders, you know that learning is messy, local, and human. It unfolds in specific classrooms with specific students and adults doing real work together. The current dominant approach—evaluating programs like black boxes and judging them by a narrow set of outcomes—doesn’t capture that reality. It leaves out teacher expertise, student experience, school culture, and the many contextual conditions that make a strategy effective here but not somewhere else.

We need evaluation that learns from you—the educators in the trenches—not just about students’ test results. You are the people who see when a strategy sparks student curiosity, when it bumps up against local realities, or when it might not work when applied to a different set of students. You also know when a program is truly moving the needle, versus being done purely out of compliance.

What’s the problem with “black box” evaluations?

Most rigorous program evaluations focus on isolating causal effects: “Did X cause Y?” That’s mostly done with experiments or quasi-experiments using standardized outcomes that policymakers and researchers can compare across time and place. But this approach has two big limitations:

  • It favors outcomes that are easy to quantify—like state test scores—over equally important outcomes that are harder to measure, like critical thinking, collaboration, or students’ sense of belonging. Those latter components are often at the heart of teachers’ daily decisions and matter deeply for long-term learning.
  • It treats context—the school, community norms, teacher skills, resources, and culture—as something to control away, rather than a source of insight about how and why something works.

This narrow focus leads to evaluations that feel detached from reality. They tell you whether something worked somewhere but not why it worked, how it worked, and under what conditions it might work in your own context.

So what might a better evaluation look like? Let’s use an example of a socio-emotional intervention (such as to boost growth mindset or self-management skills) designed to improve that socio-emotional competency and, thereby, also improve achievement. Here are some key elements:

1. Broaden outcomes beyond test scores.

Standardized tests capture important academic skills, but they miss socio-emotional growth, critical reasoning, cultural competence, and other dimensions of learning that teachers nurture every day. When evaluation counts these too—even if they’re harder to quantify—it aligns more closely with what matters for students.

In the socio-emotional-competency example, it would mean not only looking at achievement gains but also at changes in self-management or growth mindset, ideally using a survey measure designed to understand change over time. It would further involve asking teachers whether they think the intervention actually improved the competency or if it was more likely a measurement artifact (e.g., students better anticipating the “correct” answer on the survey after the intervention).

standardized

2. Mix numbers with narratives.

Rigorous causal work has its place—but it should sit next to rich qualitative evidence. That means intentionally gathering teacher perspectives, student voices, and descriptions of administrator experiences. Qualitative research has often been peripheral in program evaluation, but it helps us understand mechanisms—the how and why of what works—not just the if. In the case of the socio-emotional-competency intervention, interviews with teachers would ask if the teacher felt there was a valid, causal chain where the intervention increased self-management or growth mindset and that improvement then caused achievement gains.

They would talk about whether the intervention is easy enough to implement that it could be part of common practice. If the intervention did not show gains (in the competency or achievement), teachers would provide qualitative data on why not.

3. Make context part of the question, not something to control away.

Instead of treating local conditions as noise, good evaluation treats them as data. Knowing how a rural school engaged parents or how a multilingual classroom adapted a reading program can teach us about transportability and adaptation.

For socio-emotional conferences, that could look like asking teachers whether they felt there was support for the intervention (e.g., sufficient time to implement it well), if there were bureaucratic hurdles, if it did or did not work for students with particular learning challenges (e.g., students with a particular IEP), how their particular school and setting affected outcomes, etc.

4. Use mechanisms to guide improvement.

Instead of only reporting that an intervention “worked,” evaluations should articulate how it produced results. Was it because teachers had more collaboration time? Because students engaged more deeply with texts that reflected their lived experience? Because instructional coaching supported risk-taking? Because teachers recognized the value and bought into the strategy? These mechanisms—not just outcomes—are the critical lessons for replication and improvement.

In the socio-emotional-competency case, all of these mechanisms could emerge during teacher interviews, surveys, focus groups, or whatever was the most efficient use of their time. (Obviously, these additional data would be collected mainly in large-scale evaluations with sufficient resources to compensate teachers fairly.)

Putting contexts and conditions front and center with help from teachers

Teachers are constantly evaluating: They watch a lesson unfold, notice signs of engagement, troubleshoot misconceptions, and decide what to try next. That expertise—grounded in context and informed by deep knowledge of students—needs to be part of how we study educational effectiveness. By expanding our definition of evidence and privileging why and how as much as whether, we make evaluation more useful.
And I believe we can, by expanding the aperture of the questions we ask, just maybe, move past a hyper-fixation on test scores.

Moving beyond test scores and black boxes doesn’t mean abandoning rigor. It means expanding rigor to include modes of inquiry that respect complexity without losing clarity. It means building evaluation systems that help educators and policymakers learn what works for whom, why it works, and what to do next.

movingbeyond

Thanks to James for contributing his thoughts.

Consider contributing a question to be answered in a future post. You can send one to me at lferlazzo@epe.org. When you send it in, let me know if I can use your real name if it’s selected or if you’d prefer remaining anonymous and have a pseudonym in mind.

You can also contact me on X at @Larryferlazzo or on Bluesky at @larryferlazzo.bsky.social

Just a reminder; you can subscribe and receive updates from this blog via email. And if you missed any of the highlights from the first 13 years of this blog, you can see a categorized list here.

Related Tags:

The opinions expressed in Classroom Q&A With Larry Ferlazzo are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Events

Jobs Regional K-12 Virtual Career Fair: DMV
Find teaching jobs and K-12 education jubs at the EdWeek Top School Jobs virtual career fair.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
College & Workforce Readiness Webinar
Blueprints for the Future: Engineering Classrooms That Prepare Students for Careers
Explore how to build career-ready engineering programs in your high school with hands-on, real-world learning strategies.
Content provided by Project Lead The Way
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
School Climate & Safety Webinar
Cardiac Emergency Response Plans: What Schools Need Now
Sudden cardiac arrest can happen at school. Learn why CERPs matter, what’srequired, and how districts can prepare to save lives.
Content provided by American Heart Association

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Assessment Spotlight Spotlight on Turning Spring Assessments Into Actionable Literacy Insights
Turn spring literacy scores into action! Learn how smarter data use, growth-focused grading, and instruction can drive real progress.
Assessment Letter to the Editor The Truth About Equity Grading in Practice
A high school student shares his perspective of equity grading policies in this letter.
1 min read
Education Week opinion letters submissions
Gwen Keraval for Education Week
Assessment Online Portals Offer Instant Access to Grades. That’s Not Always a Good Thing
For students and parents, is real-time access to grades an accountability booster or an anxiety provoker?
5 min read
Image of a woman interacting with a dashboard and seeing marks that are on target and off target. The mood is concern about the mark that is off target.
Visual Generation/Getty
Assessment Should Teachers Allow Students to Redo Classwork?
Allowing students to redo assignments is another aspect of the traditional grading debate.
2 min read
A teacher talks with seventh graders during a lesson.
A teacher talks with seventh graders during a lesson. The question of whether students should get a redo is part of a larger discussion on grading and assessment in education.
Allison Shelley for All4Ed