Artificial Intelligence What the Research Says

How AI Simulations Match Up to Real Students—and Why It Matters

By Sarah D. Sparks — September 10, 2025 4 min read
AI Skeptic 1244482154
  • Save to favorites
  • Print

AI-simulated students consistently outperform real students—and make different kinds of mistakes—in math and reading comprehension, according to a new study.

That could cause problems for teachers, who increasingly use general prompt-based artificial intelligence platforms to save time on daily instructional tasks. Sixty percent of K-12 teachers report using AI in the classroom, according to a June Gallup study, with more than 1 in 4 regularly using the tools to generate quizzes and more than 1 in 5 using AI for tutoring programs. Even when prompted to cater to students of a particular grade or ability level, the findings suggest underlying large language models may create inaccurate portrayals of how real students think and learn.

“We were interested in finding out whether we can actually trust the models when we try to simulate any specific types of students. What we are showing is that the answer is in many cases, no,” said Ekaterina Kochmar, co-author of the study and an assistant professor of natural-language processing at the Mohamed bin Zayed University of Artificial Intelligence in the United Arab Emirates, the first university dedicated entirely to AI research.

See also

Photo collage of two teachers working on laptop computer.
Education Week + Getty

How the study tested AI “students”

Kochmar and her colleagues prompted 11 large language models (LLMs), including those underlying generative AI platforms like ChatGPT, Qwen, and SocraticLM, to answer 249 mathematics and 240 reading grade-level questions on the National Assessment of Educational Progress in reading and math using the persona of typical students in grades 4, 8, and 12. The researchers then compared the models’ answers to NAEP’s database of real student answers to the same questions to measure how closely AI-simulated students’ answers mirrored those of actual student performance.

The LLMs that underlie AI tools do not think but generate the most likely next word in a given context based on massive pools of training data, which might include real test items, state standards, and transcripts of lessons. By and large, Kochmar said, the models are trained to favor correct answers.

“In any context, for any task, [LLMs] are actually much more strongly primed to answer it correctly,” Kochmar said. “That’s why it’s very difficult to force them to answer anything incorrectly. And we’re asking them to not only answer incorrectly but fall in a particular pattern—and then it becomes even harder.”

For example, while a student might miss a math problem because he misunderstood the order of operations, an LLM would have to be specifically prompted to misuse the order of operations.

None of the tested LLMs created simulated students that aligned with real students’ math and reading performance in 4th, 8th, or 12th grades. Without specific grade-level prompts, the proxy students performed significantly higher than real students in both math and reading—scoring, for example, 33 percentile points to 40 percentile points higher than the average real student in reading.

Kochmar also found that simulated students “fail in different ways than humans.” While specifying specific grades in prompts did make simulated students perform more like real students with regard to how many answers they got correct, they did not necessarily follow patterns related to particular human misconceptions, such as order of operations in math.

The researchers found no prompt that fully aligned simulated and real student answers across different grades and models.

What this means for teachers

For educators, the findings highlight both the potential and the pitfalls of relying on AI-simulated students, underscoring the need for careful use and professional judgment.

“When you think about what a model knows, these models have probably read every book about pedagogy, but that doesn’t mean that they know how to make choices about how to teach,” said Robbie Torney, the senior director of AI programs at Common Sense Media, which studies children and technology.

Torney was not connected to the current study, but last month released a study of AI-based teaching assistants that similarly found alignment problems. AI models produce answers based on their training data, not professional expertise, he said. “That might not be bad per se, but it might also not be a good fit for your learners, for your curriculum, and it might not be a good fit for the type of conceptual knowledge that you’re trying to develop.”

This doesn’t mean teachers shouldn’t use general prompt-based AI to develop tools or tests for their classes, the researchers said, but that educators need to prompt AI carefully and use their own professional judgement when deciding if AI outputs match their students’ needs.

“The great advantage of the current technologies is that it is relatively easy to use, so anyone can access [them],” Kochmar said. “It’s just at this point, I would not trust the models out of the box to mimic students’ actual ability to solve tasks at a specific level.”

Torney said educators need more training to understand not just the basics of how to use AI tools but their underlying infrastructure. “To be able to optimize use of these tools, it’s really important for educators to recognize what they don’t have, so that they can provide some of those things to the models and use their professional judgement.”

Events

School & District Management Webinar Squeeze More Learning Time Out of the School Day
Learn how to increase learning time for your students by identifying and minimizing classroom disruptions.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
Reading & Literacy Webinar
Improve Reading Comprehension: Three Tools for Working Memory Challenges
Discover three working memory workarounds to help your students improve reading comprehension and empower them on their reading journey.
Content provided by Solution Tree
Recruitment & Retention Webinar EdRecruiter 2026 Survey Results: How School Districts are Finding and Keeping Talent
Discover the latest K-12 hiring trends from EdWeek’s nationwide survey of job seekers and district HR professionals.

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide — elementary, middle, high school and more.
View Jobs
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
View Jobs
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
View Jobs
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.
View Jobs

Read Next

Artificial Intelligence Opinion ‘Instant Support’: Why We Should Embrace AI Tools for English Learners
Though not a replacement for educators, it can be a powerful ally, writes Jean-Claude Brizard.
Jean-Claude Brizard
5 min read
students translating on laptops screen literature news summarization artificial intelligence concept
iStock/Getty
Artificial Intelligence Q&A How One District Uses AI to Build More Efficient Master School Schedules
In tight budgetary times, AI can find savings in schools' class schedules.
5 min read
Illustration of calendar and AI assistant.
iStock
Artificial Intelligence Q&A How This District Got Students, Teachers, Parents, and Leaders to Agree on AI
One Southern California school system went slower in developing guidelines in order to build buy-in.
3 min read
A team of people collaborate with AI to create policy.
iStock/Getty + Education Week
Artificial Intelligence Opinion AI-Drafted Emails Aren't as Good as You Think: A School HR Director Explains
I prompted ChatGPT to write a teacher’s work accommodation request. Here’s what it got wrong.
Anthony Graham
4 min read
Two silhouettes facing away from each other. Circuit board in human shape on blue. High-tech technology background.
iStock/Getty + Education Week