Testing and spreading messages to reduce AI x-risk

Overview

AI companies are currently locked in a race to create a superhuman AI, and no one knows how to make the first superhuman AI not kill everyone. There’s little governmental oversight or public awareness that everyone’s lives are being risked. We think that without government intervention directed at solving the problem and ensuring no existentially dangerous AI is developed anywhere in the world, humanity is not likely to survive. We further think it is possible to carefully but honestly inform the governments and the public of the problem and increase the chance of it being addressed. There are non-costly interventions that governments can make even before they fully agree with our understanding of the risk, that seem robustly helpful, e.g., because these interventions increase the chance of governments understanding the problem well and being able to address it directly later.

Our nonprofit aims to improve institutional response to existential risk from AI by developing and testing messaging about it and launching campaigns to educate the general public and key stakeholders about AI and related risks.

We think communicating the problem in simple and understandable, but valid (including in technicalities) language can help substantially.

What’s your plan?

With minor omissions, we plan to:

Do message testing (understand the variables, determine what successfully communicates core intuitions about the technical problem to various demographics or causes concern for valid reasons).
- Core intuitions we’d want to communicate range from “Normal computer programs are human-made instructions for machines to follow, but modern AI systems are instead trillions of numbers; the algorithms these numbers represent are found by computers themselves and not designed, controlled, or understood by humans” to “AI developers know how to use a lot of compute to make AI systems generally better at achieving goals, but don’t know how to influence the goals AI systems are trying to pursue, especially as AI systems become human-level or smarter” to “If monkeys really want something, but humans really want something different, and humans don’t really care about monkeys, humans would usually get what they wanted even if it means monkeys don’t; if an AI system that’s better at achieving goals than humans doesn’t care about humans at all, it would get what it wants even if it means humans won’t get what they want. We should avoid developing superhuman systems that are misaligned with human values”.
- Very different messages will work most efficiently to successfully produce technical understanding in very different people.
- It might be good to simultaneously promote the government incentivizing (and not creating obstacles for) narrow and clearly beneficial uses of AI (such as in drug development), as opposed to general AI or research that shortens the timelines of general AI: we want regulation to target exclusively the inherently risky technology that might kill everyone. Responsible, economically valuable innovation that doesn’t contribute to that risk, including lots of kinds of startups, should be supported.
Iterate through content to promote with short feedback loops.
Experiment with various novel forms of messaging that have the potential to go viral.
Share results with other organizations in the space.
Scale up, improve understanding of AI and increase support for helpful policies among people whose understanding and support could be more important, coordinate with others in the space on various actions.

Prepare potential responses to possible future crises, and help people improve their understanding of AI and related risks by clearly and honestly communicating around major issues as they become public.

How will this funding be used?

Expenses by category depending on our overall annual budget:

What's your track record?

Early testing showed that making the general public read technical explanations of x-risk from AI can cost as little as $0.10 per click; we’ve also had positive experience testing communicating about x-risk, including, e.g., changing the minds of 3 out of 4 people we talked to at an e/acc event, explainign the problem to people at think tanks and some people at the UK government.

Previously, Mikhail Samin, who runs this project, launched a crowdfunding campaign to print HPMOR in Russian with the aim of spreading the book's EA ideas. It became the most funded crowdfunding campaign in the history of Russia, led to 21,000 copies printed, and got hundreds of thousands of people to read the book online. He also coordinated a project to translate the 80,000 Hours Key Ideas series and videos from Rob Miles and Rational Animations, the translations gathered millions of views. Before focusing on direct impact, he built a leading music recognition provider (that allowed him to donate >$100k to effective nonprofits); co-founded the Moscow branch of the Vesna movement (which later organized anti-war protests), collaborated with Navalny's team on many projects, wrote dissenting opinions for members of electoral commissions, wrote appeals that won cases against the Russian government in courts, helped Telegram with internet censorship circumvention, won a case against Russia in the European Court of Human Rights, organized protests and campaigns.

What do you plan to do about the risks?

It’s important to carefully monitor for risks throughout the work, e.g., use polls and look at the response to the messaging from various demographics and people with different priors to actively decrease polarization (and prevent the messaging from increasing it) and at the potential for backlash and have good feedback loops from that information; avoid anything in the lines of astroturfing; etc. Please note that we're not sharing all the details here, as they might be somewhat sensitive.

What other funding are you or your project getting?

We’ve received a speculation grant from the SFF; our application is currently being evaluated in the SFF’s s-process

Update (early 2025):

What progress have you made?

Converging on Reddit as a great platform for message testing:
- Not a lot of bots/not very noisy compared to other platforms (e.g., on Twitter, the noise is very high; we tested this by running nonsensical ads).
- If, despite Reddit's recommendation, you leave the comments open, people tend to comment to express their thoughts.
- There are downvotes, which means you can have negative feedback; most ads on the platform have more downvotes than upvotes.
- As a negative aspect, it's hard to do very precise targeting: we can target by location and interests/subreddits, but can't directly select, e.g., target gender and age.
Our results on ads that describe AI x-risk:
- Note that right now, we're not aware of the long-term outcomes: how much do people change their minds or become more precisely concerned about the right problem? (We have some evidence for that via comments, but we don't know what happens to most people.) We'd need higher budgets to be able to robustly measure this.
- Click-through rates (CTR): consistently 3-5% (10x of the platform's benchmarks for the best-performing ads), up to 15% for some audiences.
- Upvote ratios (% of votes which are upvotes): consistently around 75% (almost all normal ads on the platform have <50%; from running non-optimized ads which are more similar to normal ads, we estimate that the upvote ratios for advertisement are usually around 5-30%).
- People share our posts! 5-6% of people who click on an advertised post then click on the share button, which I think is insanely high and I was pretty surprised when I looked at it.
- Cost per click (CPC): consistently around $0.10.
- See an example of the kinds of things that we explain in our ads.

Is there anything others could help you with?

Do you know any other AI x-risk ads campaigns that get more upvotes than downvotes on Reddit? If you're aware of any, please let us know, we'd be excited to chat to them! (Everything that we've seen that others run gets more downvotes than upvotes.)
Do you know people who understand AI x-risk well and are great at communicating it?
We need more funding! Do you know any funders potentially interested in scaling up that kind of message testing and then using our results at scale?
- We see many interesting differences between different audiences in how they perceive/respond to our messages, but for many of these differences, we don't really know what the underlying variables are. We really want to be able to run and iterate on a lot more ads with targeting that separates the general public into many more segments.
- The amount we've fundraised so far (~$23k) is less than half of the minimum budget we estimate would allow us to get the information we want, so we haven't really started to spend much of it.