Help AIs create AI safety tools

AI evaluations, AI control settings and cyber ranges (simulated networks) are key pieces of infrastructure for AI safety. They let us forecast and govern AI capabilities, and they can also help us build and iterate automated defences. But there's a problem: evals are difficult to build; as models improve and solve harder tasks, humans' ability to build them may not keep up.

My project will explore whether AI agents can dramatically scale production of these defenses. We have a narrow window: AI is now capable enough to accelerate parts of the safety research, but not yet so capable that it can evade our monitoring. If this works, mass-produced defensive infrastructure (both cybersecurity and Redwood-style AI control) could buy humanity significant time for alignment solutions.

About me: I was a founding member of the UK AI Safety Institute, where I did a TPM-like role and led the world's largest AI safety evaluations programme (~200 researchers, ~12,000 person-hours) and the first government-run pre-deployment autonomy testing. I'm now moving to work full-time building the infrastructure and products that I wish we'd had.

I've received generous personal funding from Manifund via ACX Grants, so am not fundraising for the next few months . This means that right now I'm primarily looking for collaborators who could later become cofounders. We'd map out the evals/control pipeline and test approaches to automating it like scaffolded agents, fine-tuning and simple tool development. This would be a very different kind of safety research org: I'm excited about building a high-velocity, automation-first group building AGI-proof technical defensive tools.

If you have strong SWE skills, are ambitious about having a big impact in AI safety & security, and are willing to work in London - reach out.