I'm a PhD researcher at the University of Oxford, where I focus on language model explainability and interpretability. My current research explores whether models can reliably explain their outputs in natural language, and what the implications are for broader human-computer interaction. I've previously worked on mechanistic interpretability problems and continue to contribute to this area, though it is less of a priority for me at the moment.
Asides from my main PhD research, I do a lot of work on LLM evals more broadly. I was part of the team behind the
LingOly reasoning benchmark, which was presented as an oral at NeurIPS 2024 (top 0.5% papers). We've also recently released
LingOly-TOO - check it out! Beyond individual benchmarks, I’m interested in building more rigorous standards and ways to aggregate the results from many benchmarks. I'm currently involved in several projects aimed at advancing this goal.
I’m a member of the
Reasoning with Machines AI Lab and am supervised by
Dr Adam Mahdi (Oxford Internet Institute) and
Professor Jakob Foerster (Department of Engineering Sciences).