About
I work on AI alignment and mechanistic interpretability. My research on Automated Circuit Discovery helped establish circuit-based interpretability as a field, with ~400 citations and methods adopted across major AI labs. 1 1 ACDC provides automated methods for discovering computational circuits in neural networks, reducing the manual labor previously required for mechanistic analysis.
Currently I'm building Dokimasia, tools to help humans realize their values when using computers—shielding against unwanted and false information. Previously research scientist at FAR AI (leading interpretability projects, building GPU infrastructure), member of technical staff at Redwood Research, and PhD student at Cambridge studying Bayesian neural networks under Carl Rasmussen.
My research interests center on: How can we evaluate interpretability explanations? How can we find algorithmic explanations at lower cost? What explains the behavior of agent-like AIs—what do they want?
Curriculum Vitae
Experience
Education
Selected Papers
Writing
Notes and essays, continuously updated. Sorted by last edit.
- Why Deep Learning Works: Specificity, Not Flexibility ml 2026-01-15
- On Consciousness and Moral Weight philosophy 2025-12-20
- Testing Integrals: A Practical Guide math 2025-11-08
- On Killing vs Letting Die ethics 2025-10-14
- Remote Development with Unison tools 2025-09-22
- Alternative Population Ethics philosophy 2025-08-05