I'm an Astra Fellow with Owain Evans (Truthful AI) and a PhD researcher at the University of Oxford.
My research evaluates LLM self-explanations: the natural language explanations LLMs give to justify their own decision-making. I measure whether these explanations are faithful to models' true internal reasoning, and develop new training incentives to improve faithfulness. I'm motivated by AI safety, and also work on LLM evals and the science of evals, including the LingOly and LingOly-TOO reasoning benchmarks. My work has been published at leading venues including NeurIPS, ICLR, and EMNLP. I'm supervised by Prof. Adam Mahdi and Prof. Jakob Foerster.
Selected Publications
A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior H Mayne*, J Kang*, D Gould, K Ramchandran, A Mahdi, N Siegel. Under review at ICML 2026
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation J Khouja, K Korgul, S Hellsten, L Yang, V Neacsu, H Mayne, R Kearns, A Bean, A Mahdi. ICLR 2026
LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations H Mayne, R O Kearns, Y Yang, A M Bean, E Delaney, C Russell, A Mahdi. EMNLP 2025
Measuring what Matters: Construct Validity in Large Language Model Benchmarks A M Bean, R O Kearns, A Romanou, F S Hafner, H Mayne, et al. NeurIPS 2025
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages A Bean, S Hellsten, H Mayne, J Magomere, E A Chi, R Chi, S A Hale, H R Kirk. NeurIPS 2024 [Oral, top 0.5% papers]
Can Sparse Autoencoders be used to Decompose and Interpret Steering Vectors? H Mayne, Y Yang, A Mahdi, Interpretable AI: Past, Present and Future @ NeurIPS 2024
Large language models can help boost food production, but be mindful of their risks. D De Clercq, E Nehring, H Mayne, A Mahdi. Frontiers in Artificial Intelligence, 2024
I'm a final year PhD researcher. I've had an unusual path, having originally studied economics.
Education
University of Oxford
DPhil Social Data Science · LLM explainability and interpretability
2023 – 2026
University of Oxford
MSc Social Data Science · Distinction (77%). OII Thesis Prize
2022 – 2023
University of Cambridge
BA Economics · Double First Class Honours. Patrick Cross Prize
2019 – 2022
Positions
Astra Fellow with Owain Evans (Truthful AI)
LLM generalisation during finetuning. Based at Constellation, Berkeley.
2026 – Present
SPAR with Noah Siegel (Google DeepMind)
Working on developing new explanatory faithfulness metrics.
2025 – Present
AI Advisor, International Growth Centre
Using AI to aid public service delivery in developing countries.
2025 – Present
Grants & Awards
Grand Union DTP, Economic and Social Research Council
Full PhD Scholarship (MSc + DPhil)
2022 – 2026
Dieter Schwarz Foundation
Research agenda sponsorship
2024 – 2026
Teaching
I've held several teaching positions including TA-ing the Social Data Science MSc at Oxford and tutoring Stanford computer science students. Previous students have gone on to the CS Masters at Stanford and various PhD positions at Oxford.
Stanford University Machine Learning · 2023–2025 Personalised ML and AI tutorials for Stanford CS undergraduates on semester abroad.
University of Cambridge Economics Interview Questions A collection of practice interview questions for prospective Cambridge economics applicants. December 2024