Safonov Alexey
Evyatar Or
Build a human values alignment database based on fictional scenarios, to both give/sell as training data, and to benchmark for alignment with human values.
Sean Peters
An early-stage AI safety research group based in Sydney, Australia
Karsten Brensing
Limited Legal Personhood as a Reversible Safety Instrument
Zaelani
18+ preprints across multiple fields, all written on a 2GB RAM phone. $600 removes the only thing standing between me and the next body of work.
Aashka Patel
Redirecting India’s Middle‑Schoolers into AI Safety, Governance, and X‑Risk Work
Sean Kwon
Open source agent monitoring tools to detect failures, infinite loops, and unsafe behavior in production AI systems
Dhruv Yadav
Auditing and improving LLM-as-a-judge systems via interpretable aggregation of preferences
Jonathan Elsworth Eicher
Ahmed dawoud
An advanced agent that perceives your screen and executes tasks by controlling the mouse, acting as a digital proxy to handle complex work on your behalf.
Linh Le
Rishub Jain
Jessica Pu Wang
Germany’s talents are critical to the global effort of reducing catastrophic risks brought by artificial intelligence.
Matei-Alexandru Anghel
A Safety Framework for Evaluating AI Humanity Alignment Through Progressive Escalation and Scope Creep
Lawrence Wagner
A benchmark for studying how failures spread across multi-agent AI systems and whether they can be detected and interrupted in time.
Pedro Bentancour Garin
Runtime safety, oversight, rollback, and control infrastructure for advanced AI in real-world, high-consequence environments.
Johan Fredrikzon
Designing a Project Funding Proposal
Wasim Gadwal
Observability and interpretability toolkit for world models in AI safety and mechanistic interpretability research.
sung hun kwag
An open-source safety pilot for detecting metric gaming, pseudo-improvement, and oversight evasion
Remmelt Ellen