Harry Mayne

PhD @ University of Oxford

LLM explainability & interpretability.

X logoLinkedIn logo
Image of me!
I'm an Astra Fellow with Owain Evans (Truthful AI) and a PhD researcher at the University of Oxford.
My research evaluates LLM self-explanations: the natural language explanations LLMs give to justify their own decision-making. I measure whether these explanations are faithful to models' true internal reasoning, and develop new training incentives to improve faithfulness. I'm motivated by AI safety, and also work on LLM evals and the science of evals, including the LingOly and LingOly-TOO reasoning benchmarks. My work has been published at leading venues including NeurIPS, ICLR, and EMNLP. I'm supervised by Prof. Adam Mahdi and Prof. Jakob Foerster.
Selected Publications
A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior
H Mayne*, J Kang*, D Gould, K Ramchandran, A Mahdi, N Siegel.
Under review at ICML 2026

LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation
J Khouja, K Korgul, S Hellsten, L Yang, V Neacsu, H Mayne, R Kearns, A Bean, A Mahdi.
ICLR 2026

LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
H Mayne, R O Kearns, Y Yang, A M Bean, E Delaney, C Russell, A Mahdi.
EMNLP 2025

Toxic Neurons Aren't Enough to Explain DPO: A Mechanistic Analysis for Toxicity Reduction
Y Yang, F Sondej, H Mayne, A Mahdi
EMNLP 2025

Measuring what Matters: Construct Validity in Large Language Model Benchmarks
A M Bean, R O Kearns, A Romanou, F S Hafner, H Mayne, et al.
NeurIPS 2025

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
A Bean, S Hellsten, H Mayne, J Magomere, E A Chi, R Chi, S A Hale, H R Kirk.
NeurIPS 2024 [Oral, top 0.5% papers]

Can Sparse Autoencoders be used to Decompose and Interpret Steering Vectors?
H Mayne, Y Yang, A Mahdi,
Interpretable AI: Past, Present and Future @ NeurIPS 2024

Large language models can help boost food production, but be mindful of their risks.
D De Clercq, E Nehring, H Mayne, A Mahdi.
Frontiers in Artificial Intelligence, 2024

Unsupervised learning approaches for identifying ICU patient subgroups: Do results generalise?
H Mayne, G Parsons, A Mahdi.
2024

I'm a final year PhD researcher. I've had an unusual path, having originally studied economics.
Education
University of Oxford
DPhil Social Data Science · LLM explainability and interpretability
2023 – 2026
University of Oxford
MSc Social Data Science · Distinction (77%). OII Thesis Prize
2022 – 2023
University of Cambridge
BA Economics · Double First Class Honours. Patrick Cross Prize
2019 – 2022
Positions
Astra Fellow with Owain Evans (Truthful AI)
LLM generalisation during finetuning. Based at Constellation, Berkeley.
2026 – Present
SPAR with Noah Siegel (Google DeepMind)
Working on developing new explanatory faithfulness metrics.
2025 – Present
AI Advisor, International Growth Centre
Using AI to aid public service delivery in developing countries.
2025 – Present
Grants & Awards
Grand Union DTP, Economic and Social Research Council
Full PhD Scholarship (MSc + DPhil)
2022 – 2026
Dieter Schwarz Foundation
Research agenda sponsorship
2024 – 2026

Teaching

I've held several teaching positions including TA-ing the Social Data Science MSc at Oxford and tutoring Stanford computer science students. Previous students have gone on to the CS Masters at Stanford and various PhD positions at Oxford.
Stanford University logo
Stanford University
Machine Learning · 2023–2025
Personalised ML and AI tutorials for Stanford CS undergraduates on semester abroad.
University of Oxford logo
University of Oxford
Applied Analytical Statistics · 2023–2024
Teaching Assistant for the Social Data Science MSc.
Oxmedica logo
Oxmedica / Mawhiba
AI and Big Data · 2024
Tutor at the Oxmedica/Mawhiba Summer Enrichment Program in Saudi Arabia.

Blog and resources

I mainly write about AI safety, explainability, and evals.
LLMs Don't Know Their Own Decision Boundaries
Summarising our EMNLP 2025 paper on self-generated counterfactual explanations.
September 2025
New to AI?
A curated reading list for getting started with AI and machine learning.
September 2025
AI Safety Researchers Should Care About Eval Quality
Why evaluation methodology matters for AI safety research.
August 2025
Cambridge crest
University of Cambridge Economics Interview Questions
A collection of practice interview questions for prospective Cambridge economics applicants.
December 2024

Contact

harry.mayne [at] oii.ox.ac.uk