Anthony Duong
David Chanin
Arifa Khan
The Reputation Circulation Standard - Implementation Sprint
Jord Nguyen
LLMs often know when they are being evaluated. We’ll do a study comparing various methods to measure and monitor this capability.
Itay Yona
Sustaining and Scaling a Grassroots Research Collective for Neural Network Interpretability and Control
H
Sudarsh Kunnavakkam
Building model organisms of CoT and Python packages for intervention in reasoning traces
Belinda Mo
A comedy that gets people thinking about AI in society
Bryce Meyer
Kristina Vaia
The official AI safety community in Los Angeles
Chi Nguyen
Making sure AI systems don't mess up acausal interactions
Apart Research
Funding ends June 2025: Urgent support for proven AI safety pipeline converting technical talent from 26+ countries into published contributors
Sarah Wiegreffe
https://actionable-interpretability.github.io/
Igor Ivanov
Asterisk Magazine
Connor Axiotes
Geoffrey Hinton & Yoshua Bengio Interviews Secured, Funding Still Needed
tamar rott shaham
Jim Maar
Reproducing the Claude poetry planning results quantitatively
Steve Petersen
Teleology, agential risks, and AI well-being
Jaeson Booker
Creating a fund exclusively focused on supporting AI Safety Research