MISSION BRIEFING SEC_LEVEL: PUBLIC

Researching methods to make AI systems safe and interpretable. Senior author on Automated Circuit Discovery (ACDC), a subfield-defining work in mechanistic interpretability with ~400 citations, now adopted across major AI laboratories.

Currently building DOKIMASIA — tools to help humans realize their values when using computers. Previously led interpretability research at FAR AI, built testing infrastructure at Redwood Research. PhD from Cambridge on Bayesian neural networks under Carl E. Rasmussen.

OPERATIONAL HISTORY TIMELINE
2026 — PRESENT
Technical Co-founder
DOKIMASIA
Building tools for value-aligned computing. Shield against unwanted information.
2023 — 2025
Research Scientist
FAR AI
Led interpretability research. Managed team of 3, collaborated with 11. Built GPU infrastructure (8-80 GPUs).
2022 — 2023
Member of Technical Staff
Redwood Research
Correctness testing for optimizing compilers. Mentored 8 interns across 4 projects.
2017 — 2021
PhD Machine Learning
University of Cambridge
Thesis: "Priors in finite and infinite Bayesian convolutional neural networks"
2016 — 2017
MSc Computer Science
University of Oxford // Distinction
2012 — 2016
BSc Computer Science
Pompeu Fabra University // 1st in class
RESEARCH OUTPUT SELECTED
NeurIPS 2023 // Spotlight
Towards Automated Circuit Discovery for Mechanistic Interpretability
A. Conmy, A. Mavor-Parker, A. Lynch, S. Heimersheim, A. Garriga-Alonso
▲ ~400 CITATIONS
ICLR 2019
Deep Convolutional Networks as Shallow Gaussian Processes
A. Garriga-Alonso, L. Aitchison, C.E. Rasmussen
▲ ~330 CITATIONS
Alignment Forum
Causal Scrubbing: A Method for Rigorously Testing Interpretability Hypotheses
L. Chan, A. Garriga-Alonso, N. Goldowsky-Dill, R. Greenblatt, et al.
▲ ~90 CITATIONS
FIELD REPORTS CONTINUOUSLY UPDATED