Adrià Garriga-Alonso

AI Alignment Researcher

Technical Co-founder @ Dokimasia

I research how to make AI systems safe and interpretable. My work on Automated Circuit Discovery (ACDC) helped establish mechanistic interpretability as a field, with ~400 citations and adoption across major AI labs.

Currently building Dokimasia, helping humans realize their values when using computers. Previously at FAR AI leading interpretability research, and Redwood Research. PhD from Cambridge on Bayesian neural networks.

Experience

2026–present

Technical Co-founder

Dokimasia

Building tools to help humans realize their values when using computers.

2023–2025

Research Scientist

FAR AI

Led interpretability research, managed team of 3, collaborated with 11. Built GPU infrastructure (8-80 GPUs).

2022–2023

Member of Technical Staff

Redwood Research

Correctness testing for optimizing compilers. Mentored 8 interns.

2017–2021

PhD in Machine Learning

University of Cambridge

Thesis: "Priors in finite and infinite Bayesian convolutional neural networks"

Selected Publications

Automated Circuit Discovery (ACDC)

A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, A. Garriga-Alonso

NeurIPS 2023 (Spotlight)

~400 citations

Deep Convolutional Networks as Shallow Gaussian Processes

A. Garriga-Alonso, L. Aitchison, C.E. Rasmussen

ICLR 2019

~330 citations

Causal Scrubbing: Rigorously Testing Interpretability Hypotheses

L. Chan, A. Garriga-Alonso, N. Goldowsky-Dill, R. Greenblatt, et al.

Alignment Forum 2022

~90 citations

Recent Writing

Why Deep Learning Works: Specificity, Not Flexibility
Updated Jan 2026 · 12 min read
On Consciousness and Moral Weight
Updated Dec 2025 · 8 min read
Testing Integrals: A Practical Guide
Updated Nov 2025 · 5 min read
Remote Development with Unison
Updated Oct 2025 · 6 min read