You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Luthien is a non-profit developing AI Control for immediate real-world deployments, based on Redwood Research's AI Control agenda. https://luthienresearch.org/
Luthien's ultimate goal is to increase the probability that effective AI Control systems will be deployed to mitigate catastrophic risks from frontier AI systems when they are developed.
We think getting real-world feedback ASAP on how AI Control systems perform in real-world situations and iterating aggressively to develop practical, effective Control systems significantly increases the odds that effective AI Control systems will be deployed to mitigate otherwise-catastrophic risks.
To that end we're developing AI Control systems targeting prosaic, lower-stakes scenarios, like occasionally-misbehaving coding assistants. By doing so, we hope to discover and solve the types of unforeseen problems that emerge when any new type of system is deployed, develop a playbook for effective AI Control deployments, establish standards and best practices for effective AI Control systems, and test and develop effective Control strategies in real-world situations. Additionally, we want to see how far we can push automated red/blue-teaming to develop effetive strategies quickly.
Luthien's secondary goal is to establish an AI Safety presence in Seattle, where there is currently a great deal of latent talent but very few opportunities to onboard into the AI Safety space.
Funding will be used for salary, API credits, other compute infrastructure, and org logistics like attending ControlConf.
Our minimal funding goal ($500) buys us a slightly longer runway (salary, hiring contractors for part-time work, API credits, other compute infrastructure, and org logistics like attending ControlConf).
At current funding levels, Luthien is on the verge of being able to hire a second person. At $5000 I'll be reasonably confident that Luthien can hire a second person full-time (because (1) I'll take it as a signal that a relatively small time investment can net sufficient donations to more than pay for the lost time and (2) with two people there will be slightly more slack such that it should be possible to do more focused deep work without needing to devote a large fraction of org resources to just fundraising).
Beyond this, donations buy runway and therefore focused-deep-work-time-not-focused-on-fundraising.
Past $100k it potentially becomes possible to hire a third person.
Luthien currently consists of me, Jai Dhyani. I'm an ex-ML Software Engineer at Meta, MATS alum, and a co-author on "RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts".
We're unable to develop Control solutions that deliver enough benefit to justify the requisite costs in money, latency, and complexity for real-world use cases.
Immediate real-world use cases are sufficiently different across all relevant dimensions such that there's little to no payoff for developing and deploying effective Control systems for high-stakes scenarios.
~$160k. $150k of this is a one-time seed grant from the AI Safety Tactical Opportunities Fund meant to get us off the ground and spur further donations. The remainder is from individuals donating ~$5k each.
There are no bids on this project.