Testing and spreading messages to reduce AI x-risk

AI governance EA Community Choice

AI Safety and Governance Fund

ActiveGrant

$12,600raised

$50,000funding goal

Donate

Overview

AI companies are currently locked in a race to create a superhuman AI, and no one knows how to make the first superhuman AI not kill everyone. There’s little governmental oversight or public awareness that everyone’s lives are being risked. We think that without government intervention directed at solving the problem and ensuring no existentially dangerous AI is developed anywhere in the world, humanity is not likely to survive. We further think it is possible to carefully but honestly inform the governments and the public of the problem and increase the chance of it being addressed. There are non-costly interventions that governments can make even before they fully agree with our understanding of the risk, that seem robustly helpful, e.g., because these interventions increase the chance of governments understanding the problem well and being able to address it directly later.

Our nonprofit aims to improve institutional response to existential risk from AI by developing and testing messaging about it and launching campaigns to educate the general public and key stakeholders about AI and related risks.

We think communicating the problem in simple and understandable, but valid (including in technicalities) language can help substantially.

What’s your plan?

With minor omissions, we plan to:

Do message testing (understand the variables, determine what successfully communicates core intuitions about the technical problem to various demographics or causes concern for valid reasons).
- Core intuitions we’d want to communicate range from “Normal computer programs are human-made instructions for machines to follow, but modern AI systems are instead trillions of numbers; the algorithms these numbers represent are found by computers themselves and not designed, controlled, or understood by humans” to “AI developers know how to use a lot of compute to make AI systems generally better at achieving goals, but don’t know how to influence the goals AI systems are trying to pursue, especially as AI systems become human-level or smarter” to “If monkeys really want something, but humans really want something different, and humans don’t really care about monkeys, humans would usually get what they wanted even if it means monkeys don’t; if an AI system that’s better at achieving goals than humans doesn’t care about humans at all, it would get what it wants even if it means humans won’t get what they want. We should avoid developing superhuman systems that are misaligned with human values”.
- Very different messages will work most efficiently to successfully produce technical understanding in very different people.
- It might be good to simultaneously promote the government incentivizing (and not creating obstacles for) narrow and clearly beneficial uses of AI (such as in drug development), as opposed to general AI or research that shortens the timelines of general AI: we want regulation to target exclusively the inherently risky technology that might kill everyone. Responsible, economically valuable innovation that doesn’t contribute to that risk, including lots of kinds of startups, should be supported.
Iterate through content to promote with short feedback loops.
Experiment with various novel forms of messaging that have the potential to go viral.
Share results with other organizations in the space.
Scale up, improve understanding of AI and increase support for helpful policies among people whose understanding and support could be more important, coordinate with others in the space on various actions.

Prepare potential responses to possible future crises, and help people improve their understanding of AI and related risks by clearly and honestly communicating around major issues as they become public.

How will this funding be used?

Expenses by category depending on our overall annual budget:

What's your track record?

Early testing showed that making the general public read technical explanations of x-risk from AI can cost as little as $0.10 per click; we’ve also had positive experience testing communicating about x-risk, including, e.g., changing the minds of 3 out of 4 people we talked to at an e/acc event, explainign the problem to people at think tanks and some people at the UK government.

Previously, Mikhail Samin, who runs this project, launched a crowdfunding campaign to print HPMOR in Russian with the aim of spreading the book's EA ideas. It became the most funded crowdfunding campaign in the history of Russia, led to 21,000 copies printed, and got hundreds of thousands of people to read the book online. He also coordinated a project to translate the 80,000 Hours Key Ideas series and videos from Rob Miles and Rational Animations, the translations gathered millions of views. Before focusing on direct impact, he built a leading music recognition provider (that allowed him to donate >$100k to effective nonprofits); co-founded the Moscow branch of the Vesna movement (which later organized anti-war protests), collaborated with Navalny's team on many projects, wrote dissenting opinions for members of electoral commissions, wrote appeals that won cases against the Russian government in courts, helped Telegram with internet censorship circumvention, won a case against Russia in the European Court of Human Rights, organized protests and campaigns.

What do you plan to do about the risks?

It’s important to carefully monitor for risks throughout the work, e.g., use polls and look at the response to the messaging from various demographics and people with different priors to actively decrease polarization (and prevent the messaging from increasing it) and at the potential for backlash and have good feedback loops from that information; avoid anything in the lines of astroturfing; etc. Please note that we're not sharing all the details here, as they might be somewhat sensitive.

What other funding are you or your project getting?

We’ve received a speculation grant from the SFF; our application is currently being evaluated in the SFF’s s-process

Update (early 2025):

What progress have you made?

Converging on Reddit as a great platform for message testing:
- Not a lot of bots/not very noisy compared to other platforms (e.g., on Twitter, the noise is very high; we tested this by running nonsensical ads).
- If, despite Reddit's recommendation, you leave the comments open, people tend to comment to express their thoughts.
- There are downvotes, which means you can have negative feedback; most ads on the platform have more downvotes than upvotes.
- As a negative aspect, it's hard to do very precise targeting: we can target by location and interests/subreddits, but can't directly select, e.g., target gender and age.
Our results on ads that describe AI x-risk:
- Note that right now, we're not aware of the long-term outcomes: how much do people change their minds or become more precisely concerned about the right problem? (We have some evidence for that via comments, but we don't know what happens to most people.) We'd need higher budgets to be able to robustly measure this.
- Click-through rates (CTR): consistently 3-5% (10x of the platform's benchmarks for the best-performing ads), up to 15% for some audiences.
- Upvote ratios (% of votes which are upvotes): consistently around 75% (almost all normal ads on the platform have <50%; from running non-optimized ads which are more similar to normal ads, we estimate that the upvote ratios for advertisement are usually around 5-30%).
- People share our posts! 5-6% of people who click on an advertised post then click on the share button, which I think is insanely high and I was pretty surprised when I looked at it.
- Cost per click (CPC): consistently around $0.10.
- See an example of the kinds of things that we explain in our ads.

Is there anything others could help you with?

Do you know any other AI x-risk ads campaigns that get more upvotes than downvotes on Reddit? If you're aware of any, please let us know, we'd be excited to chat to them! (Everything that we've seen that others run gets more downvotes than upvotes.)
Do you know people who understand AI x-risk well and are great at communicating it?
We need more funding! Do you know any funders potentially interested in scaling up that kind of message testing and then using our results at scale?
- We see many interesting differences between different audiences in how they perceive/respond to our messages, but for many of these differences, we don't really know what the underlying variables are. We really want to be able to run and iterate on a lot more ads with targeting that separates the general public into many more segments.
- The amount we've fundraised so far (~$23k) is less than half of the minimum budget we estimate would allow us to get the information we want, so we haven't really started to spend much of it.

AI Safety and Governance Fund

about 2 months ago

Some data from the chatbot:

donated $700

Mikhail Samin

3 months ago

Made a chatbot with 290k tokens of context about AI safety, for people who don't believe that AI will kill everyone, if anyone makes it superhumanly good at achieving goals. People can send their reasoning/questions/counterarguments on AI x-risk to it and see if it changes their mind: https://whycare.aisgf.us.

AI Safety and Governance Fund

9 months ago

Progress update

What progress have you made since your last update?

Converging on Reddit as a great platform for message testing:
- Not a lot of bots/not very noisy compared to other platforms (e.g., on Twitter, the noise is very high; we tested this by running nonsensical ads).
- If, despite Reddit's recommendation, you leave the comments open, people tend to comment to express their thoughts.
- There are downvotes, which means you can have negative feedback; most ads on the platform have more downvotes than upvotes.
- As a negative aspect, it's hard to do very precise targeting: we can target by location and interests/subreddits, but can't directly select, e.g., target gender and age.
Our results on ads that describe AI x-risk:
- Note that right now, we're not aware of the long-term outcomes: how much do people change their minds or become more precisely concerned about the right problem? (We have some evidence for that via comments, but we don't know what happens to most people.) We'd need higher budgets to be able to robustly measure this.
- Click-through rates (CTR): consistently 3-5% (10x of the platform's benchmarks for the best-performing ads), up to 15% for some audiences.
- Upvote ratios (% of votes which are upvotes): consistently around 75% (almost all normal ads on the platform have <50%; from running non-optimized ads which are more similar to normal ads, we estimate that the upvote ratios for advertisement are usually around 5-30%).
- People share our posts! 5-6% of people who click on an advertised post then click on the share button, which I think is insanely high and I was pretty surprised when I looked at it.
- Cost per click (CPC): consistently around $0.10.
- See an example of the kinds of things that we explain in our ads.

Is there anything others could help you with?

Do you know any other AI x-risk ads campaigns that get more upvotes than downvotes on Reddit? If you're aware of any, please let us know, we'd be excited to chat to them! (Everything that we've seen that others run gets more downvotes than upvotes.)
Do you know people who understand AI x-risk well and are great at communicating it?
We need more funding! Do you know any funders potentially interested in scaling up that kind of message testing and then using our results at scale?
- We see many interesting differences between different audiences in how they perceive/respond to our messages, but for many of these differences, we don't really know what the underlying variables are. We really want to be able to run and iterate on a lot more ads with targeting that separates the general public into many more segments.
- The amount we've fundraised so far (~$23k) is less than half of the minimum budget we estimate would allow us to get the information we want, so we haven't really started to spend much of it.

donated $5,050

Michael Dickens

11 months ago

I wrote [here](https://mdickens.me/2024/11/18/where_i_am_donating_in_2024/#ai-safety-and-governance-fund) about my donation plans and why I like this plan:

Pushing for x-risk-relevant regulation is the most promising sort of intervention right now. But we don’t have much data on what sorts of messaging are most effective. This project intends to give us that data.

Mikhail Samin, who runs the org, has a good track record of work on AI safety projects (from what I can see).
Mikhail has reasonable plans for what to do with this information once he gets it. (He shared his plans with me privately and asked me not to publish them.)
The project has room for more funding, but it shouldn’t take much money to accomplish its goal.
The project received a speculation grant from the Survival and Flourishing Fund (SFF) and is reasonably likely to get more funding, but (1) it might not; (2) even if it does, I think it’s useful to diversify the funding base; (3) I generally like SFF grants and I don’t mind funging SFF dollars.

donated $5,050

Michael Dickens

about 1 year ago

How do you plan on finding an audience? (Sounds like MTurk or something?) And how do you determine which messages are more successful than others?

donated $700

Mikhail Samin

about 1 year ago

@mdickens We mainly plan to use ads targeting different narrow audiences; and then to compare the impact different messages have on the engagement and on actions people than take on a website (we’ll also be asking them to complete surveys, though that won’t be very informative due to selection effects).

There are downsides (the feedback is somewhat low-resolution, social media algorithms might add noise as they won’t be showing the ads to random samples of people), but it seems much cheaper than using mechanical turks/focus groups and provides much shorter feedback loops.

(I also shared a bit on our longer-term strategy and how we'll use our results with Michael.)

donated $50

Lucie Philippon

about 1 year ago

I trust Mikhail takes on AI safety. He changed my mind on a lot of topics, often quite quickly. I'm looking forward to seeing the results of his project :)

Sasha Cooper

about 1 year ago

@ms Self-donating seems like a prisoner's dilemma defection to me. Many people in this initiative both received money and contributed a project to the selection, and most of us resisted the temptation to self-donate at all, let alone the full amount. Were I a funder considering a similar initiative like this I would find it highly offputting to see this behaviour (since it amounts to a first-come-first-served distribution of the funds, losing almost all the informational value it was supposed to generate).

donated $700

Mikhail Samin

about 1 year ago

@Arepo

(I am confused about the comparison to prisoner’s dilemma. In true prisoner’s dilemma, you want to mutually cooperate, but also, you defect against a stone with “cooperate” written on it. But this is not a prisoner’s dilemma, true or otherwise? I assume you just meant “defection” and weren’t referring to prisoner’s dilemma-class games.)
Quadratic matching of funding means that donating a lot to your own project if others don’t doesn’t produce corresponding matching. There could be a form of prisoner’s dilemma cooperation, where people associated with 5 orgs donated to the others equally, resulting in a lot of funds matched, without providing a lot of informational value. That would’ve been off-putting to a funder considering this and defection from something expected (and sad if EAs tried to play the system this way).
I’ve donated >$30k of own money to many of my own projects, as they often seem to me to be the highest impact opportunities (this is why I run them). I’m confused how donating instead to something that doesn’t seem as cost-effective would make the donation based on more valuable information.
I’m honestly unaware of better impact opportunities. $700 is seven thousand people clicking on a website explaining x-risk from AI. I’ve added to my balance here and donated $50 to Lightcone, but that was mostly purchasing fuzzies, not utilons.
I’m assuming donating to one’s own project is ok and it’s assumed that people can freely do that if they decide to. If a future funder doesn’t want that to happen, they ask people not to (and receive slightly less information).

Sasha Cooper

about 1 year ago

@ms No real life situations are clean examples of economics games, but this has key PD-related choices in which you reduced overall good by choosing the selfish option:

you could have increased the total funding pool by splitting but decided to concentrate the donation on yourself;
you could have given meaningful information on multiple other projects, but instead just confirmed a disposition toward your own projects that we could have already guessed (because you run them).

Sasha Cooper

about 1 year ago

What proportion of the other proposals did you even read?

donated $700

Mikhail Samin

about 1 year ago

@Arepo I’ve looked at all projects in the AI governance category (though I don’t think I opened/read all of them).

(I’m generally pretty skeptical about most things people are doing and none of the very valuable AI governance EA projects are represented on Manifund.)

Austin Chen

about 1 year ago

Hey @Arepo, I wanted to clarify that self-donation was explicitly permitted in this round and I would not want to characterize it as defecting in prisoner's dilemma. From the FAQ:

Can I direct my funds to a project I work on or am involved with?
Yes! We ask that you mention this as a comment on the project, but otherwise it’s fine to donate to projects you are involved with.

Of course, we at Manifund very much appreciate the thoughtfulness of people like yourself who spent a lot of time evaluating projects outside of their own! But in designing this round, we also wanted to include folks without much time for such evaluation, and just wanted to quickly give to a project they were very familiar with.

donated $700

Mikhail Samin

about 1 year ago

@Austin thanks! I vaguely remembered this being explicitly allowed but couldn’t quickly find it was from the FAQ

Sasha Cooper

about 1 year ago

@Austin Thanks for clarifying. I still view it as pretty antisocial/indicative of poor epistemics, even if it's allowed by the rules fwiw - everything I said above still applies.

Neel Nanda

over 1 year ago

Did you intentionally make the max donation $500? Your own donation has already exceeded that, so I imagine you want to raise the cap

donated $700

Mikhail Samin

over 1 year ago

@NeelNanda not intentionally- where do I edit this?

Neel Nanda

over 1 year ago

@ms Hmm, if it's not exposed to users, DM Austin on the Discord (linked in the corner) and ask him to fix it?

donated $700

Mikhail Samin

over 1 year ago

@NeelNanda thanks a lot for flagging this! DMed him.