Adrià Garriga-Alonso
🌸 🌺 🌿 🍃

Hello, I'm Adrià

Making AI safe and understandable

I research AI alignment and mechanistic interpretability. My work on Automated Circuit Discovery helped establish how we understand what's happening inside neural networks. Currently co-founding Dokimasia, building tools for value-aligned computing.

🌱

Experience

2026 — Present
Technical Co-founder
Dokimasia
Building tools to help humans realize their values when using computers.
2023 — 2025
Research Scientist
FAR AI
Led interpretability research, managed team of 3. Built GPU infrastructure (8-80 GPUs).
2022 — 2023
Member of Technical Staff
Redwood Research
Correctness testing for optimizing compilers. Mentored 8 interns.
2017 — 2021
PhD Machine Learning
University of Cambridge
Bayesian neural networks with Carl E. Rasmussen. First to show infinite CNNs → GPs.
2016 — 2017
MSc Computer Science
University of Oxford
Graduated with Distinction.
2012 — 2016
BSc Computer Science
Pompeu Fabra University
1st in class. la Caixa Fellowship recipient.
🌷

Research

NeurIPS 2023 Spotlight Towards Automated Circuit Discovery for Mechanistic Interpretability
A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, A. Garriga-Alonso
✨ ~400 citations
ICLR 2019 Deep Convolutional Networks as Shallow Gaussian Processes
A. Garriga-Alonso, L. Aitchison, C.E. Rasmussen
✨ ~330 citations
Alignment Forum 2022 Causal Scrubbing: Rigorously Testing Interpretability Hypotheses
L. Chan, A. Garriga-Alonso, N. Goldowsky-Dill, et al.
✨ ~90 citations
2025 Open Problems in Mechanistic Interpretability
L. Sharkey, B. Chughtai, [...], A. Garriga-Alonso, et al.
✨ ~100 citations
🌻

Writing

Why Deep Learning Works: Specificity, Not Flexibility On Consciousness and Moral Weight Testing Integrals: A Practical Guide On Killing vs Letting Die Remote Development with Unison Alternative Population Ethics