In a world increasingly embracing artificial intelligence’s potential to change the K-12 landscape, educators are asking: Can AI help me deliver high-quality feedback to my students on their tests that advances their learning?
Experts have for years been eyeing AI’s ability to help modernize standardized assessments, particularly to measure students’ ability to think critically and communicate. The technology has the potential to generate test items and score them more efficiently, as well as provide more actionable feedback to educators on their students’ strengths and weaknesses. In the classroom, teachers are also using AI to help them develop in-classroom assessments and grade them.
But AI has its shortfalls, too. Without proper guardrails on its use, generative AI poses risks to students’ privacy. The technology also sometimes offers inaccurate results, and feedback can be biased.
If educators can strike the right balance between using AI to bolster the assessments they give and ensuring humans are driving and remain actively involved in the process, AI could be a useful tool in measuring students’ progress, some experts said during a recent webinar hosted by the American Educational Research Association and the news organization The 74.
AI is showing a lot of promise in expediting the scoring of assessments and helping teachers give feedback to students faster, said Victor Lee, an associate professor of education at Stanford University. It’s important, he said, that educators continue to review the feedback AI generates and take the time to incorporate their own personalized comments and scoring to avoid biases AI has been shown to have.
“Sometimes, acceleration is great and would actually bring more satisfaction and progress and joy to everybody’s work, but if it’s accelerating with a lot of risks associated, … then that’s something we want to be really cautious about,” Lee said.
But teachers haven’t been very enthusiastic about the potential of AI to make testing better. In a 2024 EdWeek Research Center survey, about 36% of teachers, school leaders, and district leaders said they think AI will make standardized testing worse in five years. About 19% said they believed the technology might improve assessments.
A smaller percentage of educators in that survey were using AI to grade assignments.
Still, districts need to make sure that teachers establish a “baseline literacy” on how AI works so that, regardless of what new iterations of the technology comes along, there is an understanding of what it can do, how it does it, and possible shortcomings or biases, said Daniella McNamara, the executive director of the Learning Engineering Institute at Arizona State University.
She encouraged districts to let teachers experiment with AI.
“One has to expect that the only way to really keep up is by using the tools with them, helping everyone to have exposure to them,” McNamara said. “It’s just like riding a bike. You have to ride the bike in order to learn how to do it.”
Rather than asking AI for answers to specific questions, teachers and students could learn how to ask AI, for example, to explain events from different perspectives, McNamara said, whether another political party’s view or that of a historical figure.
“We can help them to build personas and then build content out,” she said. “In the learning literature, we know that the more ways you learn something and encounter something from different perspectives and different modalities, the better.”
Schools and districts considering enlisting AI’s help in assessing students’ content knowledge should be intentional about the tools they use, Lee said.
Instead of trying to learn “how to use every single tool,” educators should think about, “What tool is appropriate for the task?” he said.
“There’s the marketing hype of, AI can do everything, and, even if it could—which I would maintain it cannot—we don’t want it to do everything,” Lee said. “When you provide assessment feedback to a student, you’re also saying, ‘I paid attention to what you said, I see where your potential is,’ and I’m communicating that and I’m giving you my time.”
Humans should always remain part of the assessment process, the panelists said, making sure that the scores and feedback AI returns line up with established rubrics and checking its results for errors.
“Because the AI doesn’t know information about students contextually, … leaving the AI to make all the decisions is not a good policy,” Lee said.
Before they give an assessment, teachers should have a clear understanding of the test’s goal, McNamara added, whether to quiz understanding of a specific topic, gauge understanding of writing concepts, or measure progress in reading comprehension. Different goals might require different approaches to scoring or measuring progress, she said.
And when in doubt, educators should go back to the golden rule, Lee said: “Use AI for things with others that you would want them to use AI on with you.”