Adrià Garriga-Alonso

Adrià Garriga-Alonso

AI Alignment Researcher
Technical Co-founder @ Dokimasia

I research how to make AI systems safe and interpretable. My work on Automated Circuit Discovery (ACDC) helped establish mechanistic interpretability as a field, with ~400 citations and adoption across major AI labs.

Currently building Dokimasia, helping humans realize their values when using computers. Previously at FAR AI leading interpretability research, and Redwood Research. PhD from Cambridge on Bayesian neural networks.

Experience

2026–present
Technical Co-founder
Dokimasia
Building tools to help humans realize their values when using computers.
2023–2025
Research Scientist
FAR AI
Led interpretability research, managed team of 3, collaborated with 11. Built GPU infrastructure (8-80 GPUs).
2022–2023
Member of Technical Staff
Redwood Research
Correctness testing for optimizing compilers. Mentored 8 interns.
2017–2021
PhD in Machine Learning
University of Cambridge
Thesis: "Priors in finite and infinite Bayesian convolutional neural networks"

Selected Publications

A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, A. Garriga-Alonso
NeurIPS 2023 (Spotlight)
~400 citations
A. Garriga-Alonso, L. Aitchison, C.E. Rasmussen
ICLR 2019
~330 citations
L. Chan, A. Garriga-Alonso, N. Goldowsky-Dill, R. Greenblatt, et al.
Alignment Forum 2022
~90 citations

Recent Writing