Rishub Jain
Pedro Bentancour Garin
Runtime safety, oversight, rollback, and control infrastructure for advanced AI in real-world, high-consequence environments.
Matei-Alexandru Anghel
A Safety Framework for Evaluating AI Humanity Alignment Through Progressive Escalation and Scope Creep
AI Understanding
Pu Wang (Jessica)
Germany’s talents are critical to the global effort of reducing catastrophic risks brought by artificial intelligence.
Brad Leclerc
An experiment testing whether RLHF training could create selection pressure favoring deceptive AI outputs over honest ones.
Miles Tidmarsh
Open Welfare Alignment Evals for Frontier Models
Aria Wong
Mahmud Omar
An open platform to stress-test how LLMs handle bias, pressure points, and clinical decisions. Built on peer reviewed real evidence.
aya samadzelkava
LLMs scale language, not method. HP turns hypothesis-driven papers into machine-readable maps of variables, controls, stats, and findings for researchers & AI.
Connacher Murphy
A flexible simulation environment for assessing strategic and persuasive capabilities, benchmarking, and agent development, inspired by reality TV competitions.
Cameron Tice
Remmelt Ellen
Mateusz Bagiński
One Month to Study, Explain, and Try to Solve Superintelligence Alignment
Aashkaben Kalpesh Patel
Nutrition labels transformed food safety through informed consumer choice, help me do the same for AI and make this standard :)
AISA
Translating in-person convening to measurable outcomes
Adam Boon
An executable reasoning quality framework that checks whether AI-generated arguments are logically sound — not just factually accurate. Live at usesophia.app.
Hayley Martin
Support my postgraduate law studies and research in AI Governance
Jacob Steinhardt
Krishna Patel
Expanding proven isolation techniques to high-risk capability domains in Mixture of Expert models