How can we … help researchers build better language models?
18 August 2025
Megan Ennion, PhD Student, Faculty of Education
05 August 2025
A growing body of research suggests that using Large Language Model (LLM) tutors could reshape how students engage with learning, offering promising new forms of support, but also raising important concerns about their educational impact. While they may offer scalable, on-demand support, helping schools grappling with teacher shortages and rising absenteeism, there are fears around the impact of these tools, dependency, equity, and misinformation.
Beyond these acknowledged risks, much less attention has been given to how interacting with LLM tutors might shape students’ behaviour and psychological engagement with learning. Significant gaps remain in experimental research on the behavioural and psychological impacts of interacting with LLMs in educational contexts. These gaps limit our understanding and hinder the effective, ethical, and beneficial integration of LLM tutors into learning environments.
While AI development should be evidence-based, tutoring tools are increasingly being designed and deployed ahead of robust research, raising concerns that design choices may be driven more by commercial priorities than pedagogical insight. Without a more complete understanding of their impact, opportunities to meaningfully improve student learning may be missed - or worse, replaced by unintended harms.
Going back to school
To overcome this lack of evidence, my project focused on examining the effect of interacting with LLM tutors on student learning psychology and key behaviours: effort, perseverance, resilience, and challenge-seeking. These behaviours are linked not only to academic achievement, but also to broader life outcomes, such as improved health and greater life satisfaction. The study aimed to provide a robust and informative comparison of behavioural differences among students supported by LLM tutors, human tutors, and those working independently, as well as exploring mechanisms underlying any observed behavioural changes.
I conducted the experiment in a sixth form college with 150 students. Each student was presented with a series of eight challenging mathematics questions, delivered via a newly devised computer-based task sequence. The questions were divided into two sections, designed to be above the students’ expected A level capabilities and to increase gradually in difficulty. The task was voluntary, and students could decide how many questions to attempt. After completing the main task, students took part in a challenge-seeking exercise, in which they selected from a set of hypothetical future questions based on their preferred level of difficulty.
Each student was randomly assigned to one of the three support conditions: one group received help from a human teacher, another used an LLM AI tutor called Tutor Me (a simplified version of Khan Academy’s Khanmigo), and the third had access to open internet resources as a form of unsupported study. They were free to engage with their assigned support in any way they chose while working through the tasks.
Throughout the session, behavioural data were recorded via the computer-based task sequence, allowing for detailed analysis of effort, perseverance, resilience, and challenge-seeking. It used indicators such as number of questions attempted, self-reported difficulty and effort, response to the second set of questions, and the difficulty level of questions chosen. For students in the AI tutor and human tutor conditions, interactions were also recorded to support further analysis of behaviours and student-tutor dialogue. After completing the tasks, students reflected on their experience in a qualitative questionnaire, which explored how they felt during the session.
Under the supervision of Dr Ros McLellan, Associate Professor in Teacher Education and Development / Pedagogical Innovation, I was awarded a grant from the Accelerate Programme in collaboration with the Cambridge Centre for Data-Driven Discovery (C2D3) to fund the field work, including a human teacher to offer support in the ‘human teacher’ condition. Without the grant, we would not have been able to explore the difference between the human relationship and the machine relationship with the students.
Future studies
Having now completed data collection, we are beginning to analyse the data, which may include using an LLM to identify patterns in the data from the transcripts and qualitative questionnaires.
I recently published a paper explaining the theoretical basis for this study. In it, I explore the potential for AI chatbots to provide valuable support for independent study by giving students confidence tackling difficult academic challenges, resulting in more productive learning behaviours, and barriers to this potential.
On completing data analysis, the next step will be to share the findings and raise awareness of the study through publications, talks, and workshops, including one for the AI & Education Community, which I founded and lead, supported by Accelerate funding. Through this, I hope to engage educators, academics, not-for-profits, edtech designers, and policymakers. After my PhD, I aim to continue this work with a more in-depth longitudinal study exploring how interacting with generative AI tutors influences important learning behaviours over time.
As LLM tutors become more advanced and cost-effective, they are poised for broader accessibility in mainstream education, necessitating research-based evidence for optimal impact. I hope our work will guide this design and implementation so that AI tutors can be used to supplement and enhance teaching experiences.
AI tutors should never be seen as a replacement for human teachers. However, they may offer meaningful psychological and behavioural support for students working independently on challenging material. Used carefully and with pedagogical insight, they have the potential to complement human teaching by supporting key learning behaviours, such as effort, perseverance, resilience, and challenge-seeking, particularly in moments when one-to-one guidance is not available.
This project was funded through the 2024 Accelerate-C2D3 funding call for novel applications of AI for research and innovation. You can read more about other funded projects here.