This AI system makes human tutors better at teaching children math

The US has a major problem with education inequality. Children from low-income families are less likely to receive high-quality education, partly because poorer districts struggle to retain experienced teachers. 

Artificial intelligence could help, by improving the one-on-one tutoring sometimes used to supplement class instruction in these schools. With help from an AI tool, tutors could tap into more experienced teachers’ expertise during virtual tutoring sessions. 

Researchers from Stanford University developed an AI system calledTutor CoPilot on top of OpenAI’s GPT-4 and integrated it into a platform called FEV Tutor, which connects students with tutors virtually. Tutors and students type messages to one another through a chat interface, and a tutor who needs help explaining how and why a student went wrong can press a button to generate suggestions from Tutor CoPilot. 

The researchers created the model by training GPT-4 on a database of 700 real tutoring sessions in which experienced teachers worked on on one with first- to fifth-grade students on math lessons, identifying the students’ errors and then working with them to correct the errors in such a way that they learned to understand the broader concepts being taught. From this, the model generates responses that tutors can customize to help their online students.

“I’m really excited about the future of human-AI collaboration systems,” says Rose Wang, a PhD student at Stanford University who worked on the project, which was published on arXiv and has not yet been peer-reviewed “I think this technology is a huge enabler, but only if it’s designed well.”

The tool isn’t designed to actually teach the students math—instead, it offers tutors helpful advice on how to nudge students toward correct answers while encouraging deeper learning. 

For example, it can suggest that the tutor ask how the student came up with an answer, or propose questions that could point to a different way to solve a problem. 

To test its efficacy, the team examined the interactions of 900 tutors virtually teaching math to 1,787 students between five and 13 years old from historically underserved communities in the US South. Half the tutors had the option to activate Tutor CoPilot, while the other half did not. 

The students whose tutors had access to Tutor CoPilot were 4 percentage points more likely to pass their exit ticket—an assessment of whether a student has mastered a subject—than those whose tutors did not have access to it. (Pass rates were 66% and 62%, respectively.)

The tool works as well as it does because it’s being used to teach relatively basic mathematics, says Simon Frieder, a machine-learning researcher at the University of Oxford, who did not work on the project. “You couldn’t really do a study with much more advanced mathematics at this current point in time,” he says.

The team estimates that the tool could improve student learning at a cost of around $20 per tutor annually to the tutoring provider, which is significantly cheaper than the thousands of dollars it usually takes to train educators in person. 

It has the potential to improve the relationship between novice tutors and their students by training them to approach problems the way experienced teachers do, says Mina Lee, an assistant professor of computer science at the University of Chicago, who was not involved in the project.

“This work demonstrates that the tool actually does work in real settings,” she says. “We want to facilitate human connection, and this really highlights how AI can augment human-to-human interaction.”

As a next step, Wang and her colleagues are interested in exploring how well novice tutors remember the teaching methods imparted by Tutor CoPilot. This could help them gain a sense of how long the effects of these kinds of AI interventions might last. They also plan to try to work out which other school subjects or age groups could benefit from such an approach.

“There’s a lot of substantial ways in which the underlying technology can get better,” Wang says. “But we’re not deploying an AI technology willy-nilly without pre-validating it—we want to be sure we’re able to rigorously evaluate it before we actually send it out into the wild. For me, the worst fear is that we’re wasting the students’ time.”

Main Menu