Ella Wei
Achieving major reductions in code complexity and compute overhead while improving transparency and reducing deceptive model behavior
Jacob Steinhardt
Alex Leader
Measuring whether AI can autonomously execute multi-stage cyberattacks to inform deployment decisions at frontier labs
Krishna Patel
Expanding proven isolation techniques to high-risk capability domains in Mixture of Expert models
Lawrence Wagner
Joseph E Brown
A constraint-first approach to ensuring non-authoritative, fail-closed behavior in large language models under ambiguity and real-world pressure
Mackenzie Conor James Clark
An open-source framework for detecting and correcting agentic drift using formal metrics and internal control kernels
Finn Metz
Funding 5–10 AI security startups through Seldon’s second SF cohort.
Preeti Ravindra
AI Safety Camp 2026 project: Bidirectional Failure modes between security and safety
Xyra Sinclair
Unlocking the paradigm of agents + SQL + compositional vector search
Sean Peters
Measuring attack selection as an emergent capability, and extending offensive cyber time horizons to newer models and benchmarks
Parker Whitfill
Anthony Ware
Identifying operational bottlenecks and cruxes between alignment proposals and executable governance.
Mirco Giacobbe
Developing the software infrastructure to make AI systems safe, with formal guarantees
Gergő Gáspár
Help us solve the talent and funding bottleneck for EA and AIS.
L
João Medeiros da Fonseca
Phenomenological Fine-tuning for Medical AI Alignment
Miles Tidmarsh
Training AI to generalize compassion for all sentient beings using pretraining-style interventions as a more robust alternative to instruction tuning
Centre pour la Sécurité de l'IA
Leveraging 12 Nobel signatories to harmonize lab safety thresholds and secure an international agreement during the 2026 diplomatic window.
Chris Canal
Enabling rapid deployment of specialized engineering teams for critical AI safety evaluation projects worldwide