Researching methods to make AI systems safe and interpretable. Senior author on Automated Circuit Discovery (ACDC), a subfield-defining work in mechanistic interpretability with ~400 citations, now adopted across major AI laboratories.
Currently building DOKIMASIA — tools to help humans realize their values when using computers. Previously led interpretability research at FAR AI, built testing infrastructure at Redwood Research. PhD from Cambridge on Bayesian neural networks under Carl E. Rasmussen.