You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
AI safety has a knowledge transfer challenge. As the field grows, each new entrant trying to understand safety problems needs to invest significant time understanding core concepts before they can contribute meaningfully. This creates a bottleneck of "missing interpretive labor" - the effort required to understand ideas that could otherwise go toward advancing safety strategies. If each new entrant has to independently spend hours piecing together the same understanding, we waste a lot of valuable impact time. As capabilities accelerate, this bottleneck of missing interpretive labor becomes increasingly costly.
The Atlas Project addresses this challenge by creating systematic explanations of AI safety concepts. We create a shared foundation of understanding, accelerating the integration of new contributors, enabling more effective governance discussions, and broadening engagement by clearly articulating both immediate and long-term challenges in AI development. Rather than each person independently struggling through scattered explanations, we craft clear, progressive learning pathways that build understanding step by step. The impact is already evident: in just six months, over 400 students across multiple institutions have adopted our materials, with course organizers consistently returning to them across multiple semesters.
Every hour saved in understanding multiplies across the community. When a clear explanation saves each person just one hour, the impact potentially scales to collectively hundreds of hours redirected toward meaningful engagement with AI safety. With funding, we can significantly improve our impact by improving existing materials, creating supporting resources in multiple formats (like audio/video) and languages to reach even more audiences.
Here is a link to the current version of the Atlas.
TL;DR
AI safety faces a knowledge transfer bottleneck as the field rapidly expands
Full time work on creating clear explanations can save potentially hundreds of hours on coordination and understanding
Recent developments increase urgency of efficient knowledge transfer
High-quality distillation requires systematic, dedicated full time effort currently missing in the field
What is interpretive labor and why does it matter? There's a fundamental tradeoff between the energy put into explaining an idea and the energy needed to understand it. At one extreme, someone can craft a beautiful explanation that makes understanding feel effortless. At the other, they can do the minimum and leave their audience to struggle. This balance of energy is called interpretive labor. (Olah & Carter, 2017)
The core vision of the Atlas project is to accelerate AI safety progress by performing interpretive labor and reducing research debt. Our work serves three critical functions. First, it accelerates the technical researcher pipeline by helping new researchers quickly build deep understanding. Second, it enables more effective governance by helping policymakers engage with technical concepts. Third, it moves the Overton window by clearly articulating both immediate and long-term risks, making AI safety more accessible to broader audiences who influence the trajectory of AI development.
What makes our approach uniquely valuable is that the Atlas provides one of the only comprehensive end-to-end explanations of AI safety available today. Unlike distillation resources that focus on single agendas or isolated concepts, our 600+ page literature review creates a complete conceptual map—from fundamental risks and capabilities through governance frameworks to technical safety strategies and their limitations. This comprehensive approach allows readers to understand how all pieces of AI safety research fit together. We trace the connections between problems and potential solutions, showing both what we know and the critical gaps where more work is needed. By integrating perspectives from hundreds of sources across academia, industry labs, and independent research, we provide the essential context that allows newcomers to grasp not just individual concepts, but the full landscape of AI safety efforts. This holistic understanding enables more effective coordination and reduces redundant work across the field.
Why do multiplier effects make this more impactful? The cost of explaining stays constant whether reaching one person or thousands, but the cost of understanding multiplies with each new person entering the field. Understanding AI safety requires integrating knowledge from an overwhelming number of domains - from machine learning and cognitive science to philosophy of mind, game theory, governance frameworks, and many more. When understanding becomes too difficult, people either give up or specialize prematurely, missing important connections. Research debt accumulates, ideas get forgotten or repeatedly rediscovered, and the field fragments. (Olah & Carter, 2017)
Why do we think this vision needs more support? Like the theoretician or research engineer, the research distiller is an integral role for a healthy research community. But right now, almost no one fills it. Many want to work on research distillation but lack the support structure - there's no clear career path, few places to learn, and limited examples to follow. The field often doesn't recognize distillation as a real research contribution, overlooking how crucial systematic knowledge transfer is for scaling impact in AI safety.
AI CEOs like Altman and Amodei estimate transformative AI within 3 years (LintzA, 2025). Even accounting for potential bias in this estimate, at the very least we can say that recent developments like DeepSeek r1, OpenAI o3 and Stargate show the acceleration of capabilities. EpochAI estimates progress to keep speeding up (EpochAI, 2025). Technical understanding that previously could be built over years now needs to be acquired in months. Without systematic knowledge transfer, we risk critical safety work being bottlenecked by the time it takes each person to piece together understanding.
From our perspective, creating these explanations isn't just adding a small layer of polish to existing research - it requires transforming ideas completely. To distill an idea while also connecting it to others, we need to have a deep technical understanding plus the ability to craft intuitive progressions that make complex concepts click. This is why we need dedicated, full-time focus. Every hour we invest in this work saves hundreds of collective hours down the line.
Markov is the primary author and project developer. He contributes across all aspects of the project - research, writing, distillation, website development, video creation, etc. He was previously a scriptwriter for AI safety videos at Rational Animations, worked as a distillation fellow at Rob Miles’ AI Safety Info (Stampy) project, where he wrote explanations of dozens of AI safety concepts, and also contributed to also building the retrieval augmented generation (RAG) based distillation chatbot. He is also co-founder and CTO at Equilibria Network, and has previously worked in both software development and cyber security.
Charbel is the Executive Director of CeSIA. He leads organization, scientific direction and coordination. He brings significant pedagogical experience, including supervising ARENA projects, MLAB, and developing Europe's first general purpose AI safety course. As a quick example, his writing is part of BlueDot's official interpretability curriculum. His posts analyzing and distilling Davidad’s research received significant traction, and two of his posts were also counted amongst the best of LessWrong 2023.
We also have Jeanne Salle (AI safety teacher at ENS Ulm), and Charles Martinet (head of policy at CeSIA, research affiliate at Oxford AIGi and previously at GovAI) as contributing authors.
Vincent Corruble is a project advisor. He is a professor at Sorbonne University and a research fellow at CHAI.
Fabien Roger is scientific advisor. He has worked at Redwood Research, and is now at Anthropic.
If you want more detailed track records, please reach out privately and we can send over CVs.
TL;DR:
Created comprehensive AI safety literature review used by 400+ students
Built efficient systems for ongoing content creation and updates
Achieved organic adoption across multiple institutions
Received specific, actionable feedback for improvements to existing content
Developed scalable AI research distillation pipeline
What have we built so far? In the last few months, we've developed the AI Safety Atlas - a systematic treatment and literature review of most concepts in AI safety. The text covers everything from fundamental concepts like capabilities and risks through to technical topics like interpretability and oversight, split across nine chapters. Besides just writing the core text, we have integrated interactive elements, developed low friction collaboration mechanisms (open source editing → website → LaTeX automated pipeline), experimented with audio/video content, and built sustainable systems to maintain and improve it alongside new developments in AI.
How is what we built being used? Several groups have integrated the Atlas into their curricula. ML4Good uses our materials as core reading content, reaching 250 students annually. ENS Ulm and Saclay have built their university courses around our text, serving ~100 students per year. The AI Safety Collab (~80 students) and UBC Vancouver (30-50 students) used our materials in their programs and decided to use it again for their new iteration. We have also been cold approached by many individual independent readers thanking us for our work. The European Network for AI Safety (ENAIS) has also shown interest in using our work as part of various courses, and they think CeSIA is in a unique position to do this kind of work.
What have we learned from the initial impact? Two aspects of our growth stand out. First, adoption has been largely organic - we had very limited resources, so we didn't spend on any marketing or outreach. Course organizers are discovering our materials and choosing to use them based on merit. Second, we see strong repeat usage. When institutions try our materials, they tend to stick with them. Both of these - largely organic growth combined with repeated use across multiple semesters is quite encouraging.
How do we measure quality and impact? We track detailed metrics for every chapter including overall ratings, writing clarity, self-reported understanding, section-specific ratings, and categorized improvement requests. Newer chapters (averaging 4.2/5) significantly outperform older chapters (3.7/5). Our success metric is achieving 4.5/5 on understanding and recommendation likelihood.
As for evaluating the quality of our current materials, here is a quote from an AI Safety organizer at UBC Vancouver (Jan 2025): "For a systematic, centralized, and concise introduction to the core topics in AI safety, this is the best and most up-to-date resource I know. That's why we chose it as the basis for our Intro to AI Alignment course."
And a quote from a student (Feb 2025): "This was my first time reading the AI Safety Atlas and wow!!! It has so much insightful information and citations that I am looking forward to diving deeper into. I am super fascinated by this text and I feel I have learned so much from it already."
How are we evolving our approach based on what we've learned? Our initial approach to content development faced two challenges: First, connections between concepts helped students most, yet our early chapters lacked enough of these connections. Second, keeping up with rapidly evolving AI safety literature became increasingly difficult. To address these challenges, we've started developing an AI-augmented distillation pipeline for our newest content. This analyzes hundreds of sources simultaneously across:
Research papers (ArXiv, PubMed, Nature, IEEE...)
Blog posts (Alignment Forum, LessWrong, MIRI...)
Podcast transcripts (AXRP, MLST, Inside View...)
Research reports & Think Tanks (RAND, CSET, IAPS, GovAI...)
Other course materials (MATS, CAIS, BlueDot...)
We've only applied this new methodology to our most recent chapter (Evaluations) and parts of the Risks chapter, but we've already seen improved student feedback: "I like the amount of real-life examples incorporated into the writing when talking about a certain concept, and the citations following those examples so we can go more in depth into specific examples if we like."
The growth trajectory looks good and is expected to scale further with more outreach. We know which parts need improvement. We have granular feedback telling us which specific sections in each chapter are strong/weak, and what exactly the students feel is lacking. Students want more conceptual connections across concepts, more comprehensive coverage and visual explanations to aid the text. Facilitators need better supporting materials to run effective discussions.
Our more recent chapters, written with improved experience, meet a higher quality standard than the ones written at the start of the project. We now have the processes, expertise, and feedback mechanisms to systematically upgrade all content to our current higher standards. We just need more time and resources to continue implementing these improvements at scale while expanding into new formats that enhance understanding.
Different programs serve different needs. We aim to support and complement existing approaches by creating base layer explanations that they can further build their programs and courses upon.
BlueDot does a great job at carefully curating the best existing explanations to create learning experiences instead of writing new explanations. The more focused programs like MATS, ARENA, SPAR, ERA, … focus specifically on mentorship, hands-on engineering and research development.
As for independent course organizers, they can use whatever materials work best for their specific needs. Research fellowship and mentorship programs can use our materials to establish foundations before moving to specialized training. We actively try to see what knowledge requirements/learning outcomes such programs have (e.g. DeepMind Safety education materials, MATS curriculum, Cooperative AI curriculum, BlueDot’s various curricula, and more …) and incorporate that into our work. We're not trying to replace these approaches - we're trying to make existing ones more effective by reducing the effort needed to understand core concepts.
Our work shares general goals with the CAIS textbook in making AI safety concepts accessible, but we offer distinct and complementary coverage. The CAIS textbook is very good at providing a systems-oriented foundation and focuses a large chunk of its explanations on topics like safety engineering, collective action problems and complex systems theory. The Atlas takes a slightly different approach by going deeper into specific problems and mitigation strategies.
As concrete comparisons, the CAIS textbook condenses topics like evaluations, monitoring, and interpretability into a single subsection. The Atlas dedicates entire chapters to each of these domains individually, providing much greater technical depth. Similarly, while CAIS addresses alignment and robustness in one subsection, we offer multiple comprehensive chapters on specific risks, and alignment challenges: scalable oversight, misspecification and misgeneralization - each exploring both problem, risks and current solution strategies. We also have potential future planned chapters on cybersecurity, systems theory and multi agent alignment research.
These structural differences illustrate how both works serve complementary purposes in the ecosystem. Collectively, alongside BlueDot, CAIS, and all the other field building programs, we can strengthen the field by reaching different learning styles and serving various educational needs.
TL;DR:
Six core deliverables: quality updates to existing chapters, professional translations, audio versions, video explanations, teaching support, and technical improvements (website, parser, research scraper, etc.)
Each chapter upgrade represents substantial enhancement based on our new methodology
Clear metrics track improvements over time, with a target success metric of 4.5/5 rating for self reported understanding
Other metrics that will inform our path and progress include number of students, and engagement on audio/video.
Supporting materials address specific needs identified by students and facilitators
Flexibility to adjust priorities based on user feedback and demonstrated impact
Our future plans focus on six key deliverables that address different aspects of knowledge transfer in AI safety. The scope and timeline for delivering these improvements depends directly on funding level.
Deliverable 1: Quality updates to existing chapters. All the existing chapters are functional, but we want to raise the bar for quality. Updates are quite substantial, they don't mean simple edits or maintenance. We highlighted in the track record section that we have grown as research communicators, and have also developed better methodologies for better distillations and wider literature review.
To be a little more concrete on what an update to an existing chapter would look like, let's use the example of the Generalization chapter. Currently it adequately explains concepts like goal misgeneralization and deceptive alignment, but it draws mainly from a few DeepMind and MIRI papers. An updated version would integrate much broader ML literature on inductive biases, geometry of loss landscapes, and out-of-distribution generalization - showing how different training approaches lead to different learned algorithms. It would incorporate more empirical demonstrations, disambiguate concepts like situational awareness and scheming, and strengthen connections to other chapters. Each section upgrade requires at least 20 hours (each chapter is made up of 3-5 sections), covering feedback integration, comprehensive literature review, original diagrams, and cross-chapter connections.
Improving our content and developing our technical pipeline are pretty interrelated processes. In order to create better explanations, we also simultaneously work on refining the AI-augmented distillation tools that support this work. This is mainly for scalability and sustainability reasons, since we can't keep up with the amount of new research being published without AI augmentation. At higher funding tiers, we'll invest more in extending this pipeline, eventually potentially creating a fork of the alignment research dataset (ARD) and building upon existing work. This means funding doesn't just support better content today but builds lasting infrastructure for scaling knowledge transfer in AI safety.
Deliverable 2: Teaching materials & facilitator support. A retrospective by facilitators from 13 AI safety university groups in 2024 highlighted various needs that we want to support. As one example, they specifically requested PDFs and printed materials (shown to increase retention) and discussion guides. We've already started to build a parser pipeline that renders content into multiple formats (including LaTeX) but this needs a little more development time to be seamlessly integrated and ensure consistent quality. We also want to provide printed copies for various universities to use based on this request. This is one example of facilitator support which is not just restricted to building facilitation guides or coordination. It also highlights why at times we may need help scaling with software development contractors.
Deliverable 3: Audio content. We've tested AI-assisted audio generation (NotebookLM) but still need human review to ensure accuracy. With sufficient funding, we could produce accurate higher quality manual recordings.
Deliverable 4: Video explanations. We've also experimented with some basic pilot video content at Sorbonne University with positive feedback. Our intuition is that having videos could be very impactful. So we want to experiment by providing more video content for our chapters. Videos require approximately 40 hours each for script development, diagram creation, recording, and post-production. This is in addition to the time required to update the core writing for the chapter.
Deliverable 5: Website improvements. As our primary engagement medium, the website needs improvements including more interactive elements, professional design, and reliable content rendering. Previous UI/UX improvements have shown strong returns on investment through increased engagement. We believe there is still a lot of room to improve the website for both functionality and aesthetics, so we want to dedicate time and effort into doing so.
Deliverable 6: Professional translations. We want to deepen integration with the international AI Safety ecosystem, particularly Chinese. We can use AI assisted tools for basic translations but this does not always mean high quality. Mandarin would be the highest priority, but we also have volunteers for some other languages (French, German, Portuguese). Funding would allow fair compensation for work, and allows us to ensure professional-quality translations while maintaining technical accuracy.
We maintain flexibility to adjust priorities based on demonstrated impact - redirecting resources if certain formats show exceptional engagement or value.
TL;DR:
Demonstrated impact with minimal resources (400+ students across multiple institutions)
Currently constrained by having just one full-time team member
Four funding tiers designed for progressive impact scaling:
€47k/6mo: Maintains one FTE focused on updates to 4 priority chapters, develops at least 1 pilot video.
€94k/12mo: Updates all 9 chapters, creates new Cooperative AI chapter, produces 4+ videos, upgrades facilitator guides.
€134k/12mo: Adds funding for contractor support for a Chinese translation, a paid facilitator program (one round of multiple cohorts), and paid development work.
€183k/12mo: Extends funding to 5 professional translations, 3 facilitation cohorts, advanced technical development of a scalable automated distillation assistant.
Our initial six months was funded by a Manifund regrant by Ryan Kidd and Open Philanthropy. This bootstrap funding helped create a MVP, and demonstrate clear demand and impact. Given the fact that the content created from the initial grant is now used by hundreds of students across multiple institutions, we believe we fulfilled that objective. However, we're currently severely constrained by having just one full-time team member handling everything from content development to technical infrastructure. We need funding to maintain and improve our core explanatory content, expand into multimedia formats that enhance understanding, and ideally scale our impact through translations and comprehensive facilitator support.
We have thought a lot about this detailed funding breakdown into granular amounts, hourly rates, and deliverables at this link. This document is only presenting an overview.
This minimal funding maintains project continuity and enables focused improvement of some highest-priority chapters. It retains one full-time developer to continue basic operations and produce three core deliverables: quality updates to key chapters, initial video development, and ongoing website improvements. This tier represents bare minimum viability - enough to keep the project alive and demonstrate our improved methodology, but far from what we believe is the project's full potential.
Time Allocation & Deliverables:
Core Content Updates
Significant improvements to four chapters: Strategies, Specification, Generalization, Interpretability
Video Content Development
Create at least one pilot video
Record lectures and presentations (minimal post-production)
Technical Infrastructure
Parser development + maintenance
Website transition to new framework
Continue development on distillation pipeline - scrapers, aggregators, etc.
Recurring Costs: €6,900/month (€41,400 total, 89.4%)
One-time Costs: €4,900 (10.6%)
A full year of funding allows us to systematically improve all existing content while developing essential supporting materials for educators. This longer runway enables us to implement our research distillation methodology across the entire textbook, write new chapters on missing topics, and create teaching support materials. While this provides a more sustainable approach, we still rely on one full-time contributor and volunteer work for several core deliverables - facilitation, translations, and audio.
Time Allocation & Deliverables (in addition to all previous)
Core Content Updates
Significantly improve all 9 existing chapters (vs. only 4 chapters)
Write new chapter: Cooperative AI/Multi-agent alignment
Video Content Development
Create 4+ educational videos
Record lectures and presentations (minimal post-production)
Facilitator Support Materials
Further develop discussion guides
Further develop consolidated assessment tools and feedback forms for proper progress tracking
Recurring Costs (88.1%): €6.9k/month (€82.8k total)
One-time Costs (11.9%): €11.2k total
This funding level breaks the 1 FTE bottleneck by enabling specialized contractor support, allowing work to progress on multiple deliverables simultaneously while maintaining the 12 month runway. By bringing external contractors for software development and other specialized tasks, the lead developer can focus on improving core content quality. With funding to fairly compensate translators, designers, and facilitators, we can move beyond volunteer-dependent work to establish reliable timelines and better demonstrate the project's potential with proper support structures.
Time Allocation & Deliverables (in addition to previous)
Core Content Updates
Write second new chapter: Cybersecurity for AI
Video Content Development
6-8 videos
Professionally edited YouTube-style videos instead of simple recorded lectures
Translation Program
Complete professional Chinese translation
Facilitation Program
One complete course round (of multiple cohorts) with fairly compensated facilitators
Technical Infrastructure
Basic software development on automated distillation assistant
Recurring Costs (75.7%): €8.5k/month (€102.1k total)
Contractor Support (24.3%): €32.4k
Safety Buffer (10%): €13.4k (10.0%)
This ideal funding enables comprehensive expansion across all deliverables, with greater language coverage, more facilitation rounds, and complete content development across all priority topics. With sufficient specialist support for development, translations, and teaching, we can fully deliver on all six core deliverables without relying on volunteer work. We can fulfill facilitator requests, produce professionally edited videos, guarantee translations in multiple languages, and expand our impact through additional facilitation rounds throughout the year.
Time Allocation & Deliverables (in addition to previous)
Core Content Updates
Write third new chapter: Agent Foundations
Video Content Development
9+ professional-quality educational videos
Translation Program
Three languages total (Chinese plus two additional languages)
Facilitation Program
Three facilitation rounds throughout the year
Printing and distribution of physical materials
Technical Infrastructure
Extensive software development on automated distillation assistant
Recurring Costs (64.0%): €9.9k/month (€118.7k total)
Contractor Support (26.0%): €46.3k
Safety Buffer (10%): €17.8k (10.0%)
Is there value in additional funding beyond the requested level? Yes! We are trying to be pragmatic by offering the different tiers with concrete deliverables, but this is a very ambitious project with a lot of scope for development. Additional funding beyond the requested amounts can be used to expand and offer more facilitation programs. We could accommodate more students, or work with facilitators for longer periods to improve course delivery. We could invest in professional-grade video production and editing to create higher quality educational content. With sufficient development resources and several people working together on the project, we could build out a full-fledged educational platform, potentially integrating with established providers like Coursera or Udemy to dramatically increase reach. Finally, we could probably hire more developers to properly extend the alignment research dataset and create a fully fledged AI safety research distillation assistant to make sure the project is sustainable. Each of these expansions directly multiplies our impact by making the content more accessible and engaging to a wider audience.
We're intentionally structured as a non-profit academic initiative rather than seeking self-sustainability through revenue.
To ensure our funding request is appropriately calibrated, we've benchmarked our costs against comparable projects and roles in the AI safety space. The minimal scenario is on the lower end of the location adjusted range (€60k), and the ideal scenario is in the middle of the location adjusted range (€80k). The work encompasses technical research, communication, full stack software development, content creation, website design, and video production. The compensation adjusted for experience and work responsibilities seem fair relative to comparable positions:
BlueDot AI Safety Specialist (2025, Remote): £60-90k ~= €72.5-108k
EpochAI Communications Lead (2025, Remote): $80-110k ~= €76-105k
AI Digest Technical Staff (2025, Remote): $100-200k ~= €95k-190k
MIRI Communications Lead (2024, Remote): $75-175k ~= €71-165k
FAR AI Technical Communications (2024, Remote): $60-100k ~= €57-95k
Rational Animations Scriptwriter (2024, Remote): $50/hour or €60k/y
Range: €74k-146k
Location adjusted range (- 25%): €55.5k - €109.5k
Facilitator reference classes:
CAIS IMLS: $1000/cohort (€950/cohort)
BlueDot AISF: £1180/cohort (€1400/cohort), £27-40/hour (€32-47/hour)
Our rate: €12k/cohort (5 facilitators, 10 weeks, 5 hours/week at €25/hour) (each additional cohort by same facilitator paid less due to overlapping work)
Field evolution speed: AI safety is advancing incredibly rapidly. New papers, results, and paradigm shifts emerge weekly. Our content could become outdated faster than we can update it manually. This is a big risk. This is also why we're investing some portion of our time in developing an automated research distillation pipeline - to help us efficiently process and integrate new developments. If successful, this would allow our explanations to scale alongside field developments. This is a core component of not just AI safety distillation and communication work but for automating parts of AI safety research, which itself is a part of the alignment plan in general. If the automated assistance proves insufficient, we would need to significantly narrow our scope to maintain quality. Alternatively, we would need to request more funding for researchers and editors to parallelize the work manually.
Medium mismatch: We're investing in text-based explanations, but this might not match how people best learn these concepts. While we're exploring multiple formats (video, audio, interactive elements), spreading resources across these could prevent any single medium from reaching excellence. Our staged approach lets us test each format with real users before significant investment. If we discover text isn't the optimal medium, we can pivot our focus based on user engagement data.
User experience barriers: Even excellent content won't have an impact if the reading experience is poor. Our website must compete for attention with professionally designed platforms. If navigation is confusing or content presentation is subpar, we'll lose readers before they engage with the substance. While we have plans for UI/UX improvements, we need to carefully balance technical development with content quality.
Scope vs quality tradeoff: We risk trying to do too many things at once - comprehensive textbook, video series, teaching platform, translation program. Basically falling prey to scope creep and competing on too many fronts. This could lead to delivering mediocre quality across all fronts rather than excellence in key areas. Getting this balance wrong could severely limit our impact as users won't return to or recommend a subpar resource. If we try to do everything, we risk doing nothing well enough to matter. We aim to use clear metrics to identify when to double down on what's working versus when to explore new approaches.
Thanks a lot for reading!