Adrià Garriga-Alonso

Adrià Garriga-Alonso

Making AI systems safe and interpretable

About

I work on AI alignment and mechanistic interpretability. My research on Automated Circuit Discovery helped establish circuit-based interpretability as a field, with ~400 citations and methods adopted across major AI labs. 1 1 ACDC provides automated methods for discovering computational circuits in neural networks, reducing the manual labor previously required for mechanistic analysis.

Currently I'm building Dokimasia, tools to help humans realize their values when using computers—shielding against unwanted and false information. Previously research scientist at FAR AI (leading interpretability projects, building GPU infrastructure), member of technical staff at Redwood Research, and PhD student at Cambridge studying Bayesian neural networks under Carl Rasmussen.

My research interests center on: How can we evaluate interpretability explanations? How can we find algorithmic explanations at lower cost? What explains the behavior of agent-like AIs—what do they want?

Curriculum Vitae

Experience

2026–
Technical Co-founder
Dokimasia · Remote
2023–2025
Research Scientist
FAR AI · Berkeley, CA
2022–2023
Member of Technical Staff
Redwood Research · Berkeley, CA
2021
Summer Research Fellow
Center on Long-Term Risk
2019
Research Intern
Microsoft Research Cambridge

Education

2017–2021
PhD Machine Learning
University of Cambridge · Supervisor: Carl E. Rasmussen
2016–2017
MSc Computer Science (Distinction)
University of Oxford
2012–2016
BSc Computer Science (1st in class)
Pompeu Fabra University, Barcelona

Selected Papers

Writing

Notes and essays, continuously updated. Sorted by last edit.