Artificial Intelligence What the Research Says

How AI Simulations Match Up to Real Students—and Why It Matters

By Sarah D. Sparks — September 10, 2025 4 min read
AI Skeptic 1244482154
  • Save to favorites
  • Print

AI-simulated students consistently outperform real students—and make different kinds of mistakes—in math and reading comprehension, according to a new study.

That could cause problems for teachers, who increasingly use general prompt-based artificial intelligence platforms to save time on daily instructional tasks. Sixty percent of K-12 teachers report using AI in the classroom, according to a June Gallup study, with more than 1 in 4 regularly using the tools to generate quizzes and more than 1 in 5 using AI for tutoring programs. Even when prompted to cater to students of a particular grade or ability level, the findings suggest underlying large language models may create inaccurate portrayals of how real students think and learn.

“We were interested in finding out whether we can actually trust the models when we try to simulate any specific types of students. What we are showing is that the answer is in many cases, no,” said Ekaterina Kochmar, co-author of the study and an assistant professor of natural-language processing at the Mohamed bin Zayed University of Artificial Intelligence in the United Arab Emirates, the first university dedicated entirely to AI research.

See also

Photo collage of two teachers working on laptop computer.
Education Week + Getty

How the study tested AI “students”

Kochmar and her colleagues prompted 11 large language models (LLMs), including those underlying generative AI platforms like ChatGPT, Qwen, and SocraticLM, to answer 249 mathematics and 240 reading grade-level questions on the National Assessment of Educational Progress in reading and math using the persona of typical students in grades 4, 8, and 12. The researchers then compared the models’ answers to NAEP’s database of real student answers to the same questions to measure how closely AI-simulated students’ answers mirrored those of actual student performance.

The LLMs that underlie AI tools do not think but generate the most likely next word in a given context based on massive pools of training data, which might include real test items, state standards, and transcripts of lessons. By and large, Kochmar said, the models are trained to favor correct answers.

“In any context, for any task, [LLMs] are actually much more strongly primed to answer it correctly,” Kochmar said. “That’s why it’s very difficult to force them to answer anything incorrectly. And we’re asking them to not only answer incorrectly but fall in a particular pattern—and then it becomes even harder.”

For example, while a student might miss a math problem because he misunderstood the order of operations, an LLM would have to be specifically prompted to misuse the order of operations.

None of the tested LLMs created simulated students that aligned with real students’ math and reading performance in 4th, 8th, or 12th grades. Without specific grade-level prompts, the proxy students performed significantly higher than real students in both math and reading—scoring, for example, 33 percentile points to 40 percentile points higher than the average real student in reading.

Kochmar also found that simulated students “fail in different ways than humans.” While specifying specific grades in prompts did make simulated students perform more like real students with regard to how many answers they got correct, they did not necessarily follow patterns related to particular human misconceptions, such as order of operations in math.

The researchers found no prompt that fully aligned simulated and real student answers across different grades and models.

What this means for teachers

For educators, the findings highlight both the potential and the pitfalls of relying on AI-simulated students, underscoring the need for careful use and professional judgment.

“When you think about what a model knows, these models have probably read every book about pedagogy, but that doesn’t mean that they know how to make choices about how to teach,” said Robbie Torney, the senior director of AI programs at Common Sense Media, which studies children and technology.

Torney was not connected to the current study, but last month released a study of AI-based teaching assistants that similarly found alignment problems. AI models produce answers based on their training data, not professional expertise, he said. “That might not be bad per se, but it might also not be a good fit for your learners, for your curriculum, and it might not be a good fit for the type of conceptual knowledge that you’re trying to develop.”

This doesn’t mean teachers shouldn’t use general prompt-based AI to develop tools or tests for their classes, the researchers said, but that educators need to prompt AI carefully and use their own professional judgement when deciding if AI outputs match their students’ needs.

“The great advantage of the current technologies is that it is relatively easy to use, so anyone can access [them],” Kochmar said. “It’s just at this point, I would not trust the models out of the box to mimic students’ actual ability to solve tasks at a specific level.”

Torney said educators need more training to understand not just the basics of how to use AI tools but their underlying infrastructure. “To be able to optimize use of these tools, it’s really important for educators to recognize what they don’t have, so that they can provide some of those things to the models and use their professional judgement.”

Events

This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Special Education Webinar
Hidden Costs of Special Ed Vacancies: Solutions for Your District
When provider vacancies hit, students feel it first. Hear what district leaders are doing to keep IEP-related services on track.
Content provided by Huddle Up
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Privacy & Security Webinar
How Technology Is Reshaping Childhood
How do we protect kids online while embracing innovation? Learn about navigating safety, privacy, and opportunity in the Digital Age.
Content provided by Connect x Protect
Budget & Finance Webinar Creative Approaches to K-12 Budget Realities
What are districts prioritizing in 2026? New survey data reveals emerging K-12 budgeting trends.

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Artificial Intelligence Teachers Say Lack of AI Guidance Is a Major Problem
Most teachers say they have not received formal guidance on how to use AI tools in their work.
5 min read
TeachersAI SG16
A high school teacher with eight years of experience works with an instructor during a presentation at the first training session of the National Academy for AI Instruction on March 18, 2026, at UFT headquarters in New York City. Many teachers haven't received formal guidance on how to use the technology responsibly and effectively.
Salwan Georges for Education Week
Artificial Intelligence Opinion 4 Questions We Must Answer Before Bringing AI Into the Classroom
Student learning should be the primary criterion for if and when AI belongs in K-12 schools.
Norman Eng
5 min read
A stack of books in the form of a school house built with knowledge. A row of digital school houses repeat and glitch in iterations becoming distorted.
Vanessa Solis/Education Week + iStock/Getty
Artificial Intelligence Teachers' Union's AI Plan Seeks 'Big Tech Tax,' Elementary Screen Bans
The American Federation of Teachers launches push to limit AI-based tools for students.
4 min read
Randi Weingarten, the president of the American Federation of Teachers, calls for a ban on screens and limited artificial intelligence use in schools at the National Press Club in Washington, on May 27, 2026.
Randi Weingarten, the president of the American Federation of Teachers, calls for a ban on screens and limited artificial intelligence use in schools during a news conference at the National Press Club in Washington, on May 27, 2026.
Marvin Joseph/Education Week
Artificial Intelligence Video How AI Complicates Student Well-Being. What Schools Should Know
Many kids cannot tell the difference between an AI-driven chatbot and genuine human understanding.