🌸 🌺 🌿 🍃

Hello, I'm Adrià

Making AI safe and understandable

I research AI alignment and mechanistic interpretability. My work on Automated Circuit Discovery helped establish how we understand what's happening inside neural networks. Currently co-founding Dokimasia, building tools for value-aligned computing.

Say hello Google Scholar GitHub

🌱

Experience

2026 — Present

Technical Co-founder

Dokimasia

Building tools to help humans realize their values when using computers.

2023 — 2025

Research Scientist

FAR AI

Led interpretability research, managed team of 3. Built GPU infrastructure (8-80 GPUs).

2022 — 2023

Member of Technical Staff

Redwood Research

Correctness testing for optimizing compilers. Mentored 8 interns.

2017 — 2021

PhD Machine Learning

University of Cambridge

Bayesian neural networks with Carl E. Rasmussen. First to show infinite CNNs → GPs.

2016 — 2017

MSc Computer Science

University of Oxford

Graduated with Distinction.

2012 — 2016

BSc Computer Science

Pompeu Fabra University

1st in class. la Caixa Fellowship recipient.

🌷

Research

NeurIPS 2023 Spotlight Towards Automated Circuit Discovery for Mechanistic Interpretability

A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, A. Garriga-Alonso

✨ ~400 citations

ICLR 2019 Deep Convolutional Networks as Shallow Gaussian Processes

A. Garriga-Alonso, L. Aitchison, C.E. Rasmussen

✨ ~330 citations

Alignment Forum 2022 Causal Scrubbing: Rigorously Testing Interpretability Hypotheses

L. Chan, A. Garriga-Alonso, N. Goldowsky-Dill, et al.

✨ ~90 citations

2025 Open Problems in Mechanistic Interpretability

L. Sharkey, B. Chughtai, [...], A. Garriga-Alonso, et al.

✨ ~100 citations

Hello, I'm Adrià

Experience

Research

Writing