You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Timelines to AGI could be very short (2-4 years) and we need solid plans to address the risks. Although some useful general outlines of plans and a few detailed scenario analyses have been produced, comprehensive plans remain underspecified. This project will establish a prestigious contest that elicits proposals which address short timeline scenarios, fill in the gaps of existing plans, and redteam AGI safety strategies. They should address strong possibilities of short AGI timelines (2-4 years), no major safety breakthroughs during that time, and uncertainty about future events. The winning proposals will receive expert feedback from the judges with the goal of refining and synthesizing the best parts, followed by broader public dissemination and discussion that can lay the foundation for follow up work. This contest will advance discussion and preparation of AGI safety and governance measures by identifying gaps and failure mode mitigations for existing strategies and theories of victory.
Creating clarity around AGI safety strategy. This contest will elicit a wide variety of proposals for making implementation details concrete, clarifying potential failure modes, and suggesting improvements.
Incentivizing submissions with prizes, prestige, and recognition. Contributions can range from full plans to tactical modules that address a key part of plans. We will ensure room for rewarding a wide variety of useful submissions.
Focusing attention on existing plans and the hard parts of the problem. Specifically this contest will accept submissions that highlight improvements or crucial considerations regarding both alignment and governance plans. Two examples of strong plans that can serve as a base to start engaging with these questions are:
Google’s A Technical Approach to AGI Safety and Security is the most thorough example of the former.
MIRI’s AI Governance to Avoid Extinction research agenda seems to be the most thorough example of the latter.
Targeting outreach toward people well suited to build on and engage with plans. The contest will solicit submissions from experts in planning, AI safety, governance, and strategy, while also maintaining an open submission process to discover novel talent and ideas.
The contest will aim for submissions which are useful and adaptable:
Concrete: Ideally include sample timelines, prioritization (must-have vs ideal ingredients), threat models, tradeoff justifications, failure-mode mitigations, team structures, estimated costs, or implementation bottlenecks.
Modular: Can be adapted to multiple organizational contexts (e.g., labs, governments, or coalitions). Some entries should address government involvement explicitly, like Manhattan or CERN for AI proposals.
Forecast-aware: Conditioned on short timelines but responsive to developments that could shift that.
Counterfactually useful: Even if not implemented in full, a good plan should sharpen collective understanding or reveal overlooked assumptions.
More details can be found in Draft Submission Guidelines & Judging Criteria below.
Contests have previously led to useful AI safety contributions: The Eliciting Latent Knowledge contest was successful at eliciting counterarguments and key considerations that the hosts hadn’t considered even after hundreds of hours of thought. They found these submissions valuable enough to pay out $275,000. Directing this kind of attention at AI safety plans when we may live in a short timeline world seems prudent and worth trying.
Incentivizing planning for AGI safety, which is a distinct type of work and skillset. By default we should expect not enough planning to happen. History shows society generally underprepares for the possibility of dramatic changes (e.g. Covid-19). We also should not assume good technical or research takes are enough for sufficient strategic takes. Planning is a distinct action and skillset. It’s important to invest time specifically developing strategic clarity by specifying priorities, plan details, and limitations. When we don’t plan adequately for future scenarios, we are forced to improvise under time constraints. If we had some plans in advance of Covid-19, we probably could have done much better. We could have had an early warning system, low-cost near perfect PPE, better contact tracing, and faster vaccine manufacturing.
There are several plausible pathways for improving AI safety plans
Some ways this might work:
Having a bunch of people surfacing crucial considerations and assessing alternate scenarios can make such a plan more useful.
You could have a plan for the right scenarios, but be missing details that would make the plan easier, more effective, or more robust.
There could be good proposals for parts of the problem but it’s unclear if they are solving hard parts of the problem. We need plans to at least address those hard parts even if they aren’t all fully solved.
Prizes 15-25k
10-14k first
3-6k second
1-2k third
$500 each for 10 honorable mentions.
Operations 15-20k
Average 15 h/week * $48.76/h for 6 months (26 weeks)
Probably more front loaded
Scope out strategy, talk to previous contest hosts, set up logistics, secure judges, advertise to groups, headhunt promising participants, answer questions, coordinate dissemination strategy
Improve submission guidelines, judging criteria, talk to more people and get feedback on this document.
Logistics: ~2-5k
Setting up submission system
Website
Targeted paid promotion
Includes budget for a potential award ceremony, which may be virtual
Disseminating and marketing/popularizing winning proposals ~5-10k
Budget safety buffer ~5k aka 7.5-10%
Preliminary review/short list best plans for thorough judge review.
Estimate: 100 submissions 30 min review 3 reviewers * 50/hr = 7.5-10k
Judge discussions to surface valuable insights as well.
What we can do with additional funds
Larger prizes, more honoraria for judges, marketing/disseminating the winning proposals, expanded outreach and participant support, maybe a retreat for the winners.
Same as minimum budget but with larger prizes & honoraria for judges
Prizes 30-50k
20-30k first
6-10k second
2-4k third
$500 each for 12 honorable mentions.
Honoria for judges
Estimate: 100 submissions * 45 min review * 3-5 reviewers * $50/hr = 18-25k
Changes from medium budget in bold
Prizes 60-100k
40-50k first
12-24k second
4-8k third
$500 each for 20 honorable mentions.
Operations 20-25k
Logistics: ~15-25k
Setting up submission system
Website
Targeted paid promotion
Retreat for winners to further refine their proposals
Disseminating and marketing/popularizing winning proposals ~30-50k
Budget safety buffer ~25k aka 7.5-10%
Honoria for judges
Estimate: 100 submissions * 45 min review * 3-5 reviewers * 50/hr = 18-25k
Main Organizer
Peter Gebauer: technical AI governance research manager for ERA in Cambridge (this project is independent and unaffiliated with them); previously directed the Supervised Program for Alignment Research, completed the GovAI summer 2024 fellowship, co-authored Dynamic Safety Cases for Frontier AI, helped with recruiting at Anthropic, and managed communications for a statewide political campaign.
Advisors
Ryan Kidd
Seth Herd
In this contest, we invite your takes on the big picture: if transformative AI is developed soon, how might the world overall navigate that, to reach good outcomes?
We think that tackling this head on could help force us to tackle the difficult and messy parts of the problem. It could help us to look at things holistically, and better align on what might be especially important. And it could help us to start to build more shared visions of what robustly good paths might look like.
Of course, any real trajectories will be highly complex and may involve new technologies that are developed as AI takes off. What is written now will not be able to capture all of that nuance. Nonetheless, we think that there is value in trying.
The format for written submissions is at your discretion, but we ask that you at least touch on each of the following points:
200–400 words executive summary
How the technical difficulties of first building safe transformative AI are surmounted
How the socio-political difficulties of first building safe transformative AI are surmounted
What transformative applications are seen first, and how the world evolves
How the challenges of the-potential-for-an-intelligence-explosion are surmounted (either by navigating an explosion safely, or by coordinating to avoid an explosion)
Biggest uncertainties and weak points of the planWhat are the most likely causes and outcomes if this project fails?
Challenge: people don’t submit
Solutions
Prestigious judges
Lots of smaller prizes to reward many types of valuable submissions
Larger prizes overall
Commitment to diffusing the winning ideas and generating positive impact from them
Challenge: bad submissions
Solutions
Clear criteria for useful submissions.
Reach out to experienced people.
Allow for multiple kinds of submissions - proposing and improving plans.
Facilitate connections between people with complementary expertise and interests.
Challenge: people might delay releasing useful work because of this
Solution
Include prizes for recently published work that would have scored highly in the contest.
Bad selection of judges
Choose highly respected ones with a track record of solid publications and community standing.
Diverse array of background and expertise for judging each submission
Senior researchers afraid to submit
Framing contest as exploratory and collaborative.
Non-binding participation: make it clear participation does not mean an endorsement of a particular timeline or p(doom).
Private evaluation - only finalists or winners submissions become public.
Ensure judges give constructive feedback.
Anonymous submission option.
Waste judges’ time
Prescreen submissions for minimum standards based on judging criteria.
Inconsistent judging methodology
Standardized rubric
Two judges per finalist submission, normalize judge scores
CoIs implicate judges
Filter for COIs
Double-blind evaluation
We think this should produce submissions that will at least fill some gaps in current limited plans, and produce more collective engagement with the theoretical and practical challenges of AGI on the current trajectory.
None
It’s important to know how to allocate talent in order to achieve victory conditions for AGI going well. Talent is not unlimited - it is limited and must be allocated effectively. Increasing our strategic clarity is useful for knowing how to allocate talent and how the plethora of work in AI safety and governance can fit together effectively. And figuring out ways to attract, develop, and direct talent is an essential part of strategic planning. Finally, a contest can also attract new people.
Contests are generally high variance and useful for getting a wide variety of ideas you might not have considered. The Eliciting Latent Knowledge contest seems like the most successful AI safety contest and was able to surface many novel key considerations, paying out $275,000 total for contributions. The OpenPhilanthropy Worldview Investigations contest also resulted in some excellent analyses.
Some pitfalls of other previous contests: too short, too broad, too targeted at people who are relatively inexperienced. This contest will be higher profile than competitions like a weekend research sprint that targets AI safety newcomers with small prizes.
Timelines to transformative AI could be short: 2-4 years or less. There are few comprehensive plans for making the development of superhuman AI systems go well. A well designed contest can encourage talented researchers to develop strong proposals we can build on and advocate for. Past contests have achieved useful submissions
(e.g., Boaz Barack on Open Phil worldview contest, Victoria Krakovna/Mary Phuong/etc. on ELK) and identifying new talent (e.g., Quintin Pope on OP worldview, James Lucassen/Oam Patel on ELK).
Writing down plans and sharing them helps us identify gaps and spot areas for improvement together. For a more thorough overview of extreme AI risks, we recommend reading Managing Extreme AI Risks Amid Rapid Progress.
This contest recognizes that AI safety is a complex domain where no single approach will be sufficient. Plans may not always work, but planning is essential. All submissions are understood to be exploratory contributions to a collaborative field rather than definitive solutions. Participants can choose their level of public association with their work, and all feedback will be moderated to ensure it focuses on improving plans rather than criticizing contributors.
Once set up, we expect to allow 3 months for submissions and close submissions in late October.
We will announce winners about a month after.
asdExamples of Work That Provides Strategic Clarity on AI Safety Plans
“Plans” or rather pieces of plans that we are glad exist because they provide strategic clarity, details about a key component of a plan, or requirements of a comprehensive plan:
“Scenarios” we are glad exist:
What makes these good?
Describe a particular scenario for TAI takeoff
Details the transition from pre-TAI to post-TAI
Lists concrete series of events or recommended actions
Justify and explain their threat models and risk factors
Cites a well-defined problem (e.g. AI designed mirror pathogens)
Cited problems fit with the scenario they have established
Clear takeaways
We should be concerned about/watch out for x
We should do y if z happens
These fit the scenario established
Ideally takeaways seem robustly good across multiple scenarios
Detailed combinations of techniques that aid AI alignment/control