Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

The First Actionable Interpretability Workshop at ICML 2025

Science & technologyTechnical AI safety
🐠

Sarah W.

ProposalGrant
Closes July 19th, 2025
$1,500raised
$500minimum funding
$1,500funding goal
Fully funded and not currently accepting donations.

Project summary

The workshop on Actionable Interpretability@ICML2025 aims to foster discussions on leveraging interpretability insights to drive tangible advancements in AI across diverse domains. We welcome contributions that move beyond theoretical analysis, demonstrating concrete improvements in model alignment, robustness, and real-world applications. Additionally, we seek to explore the challenges inherent in translating interpretability research into actionable impact.

Our areas of interest include:

  1. Practical applications of interpretability insights to address key challenges in AI such as hallucinations, biases, and adversarial robustness, as well as applications in high-stakes, less-explored domains like healthcare, finance, and cybersecurity.

  2. Comparative analyses of interpretability-based approaches versus alternative techniques like fine-tuning, prompting, and more.

  3. New model architectures, training paradigms or design choices informed by interpretability findings.

  4. Incorporating interpretability–often focusing on micro-level decision analysis–into more complex scenarios, like reasoning processes or multi-turn interactions.

  5. Developing realistic benchmarking and assessment methods to measure the real-world impact of interpretability insights, particularly in production environments and large-scale models.

  6. Critical discussions on the feasibility, limitations, and future directions of actionable interpretability research. We also invite perspectives that question whether actionability should be a goal of interpretability research.

We have additionally invited a diverse array of keynote speakers and panelists (8 total) from both academia and industry to discuss their efforts to make interpretability research actionable. We have 89 accepted papers, many of which make concrete contributions to technical AI safety research, on topics such as using interpretability to perform model steering, unlearning, and safety assurance/control at deployment-time. The information/resources (papers, posters, and talk recordings) will be posted on our website for public access.

How will this funding be used?

We plan to host a social dinner after the workshop. Invitations will first be extended to our invited speakers (4 people) and panelists (4 people) to thank them for their contributions to the workshop. If we have extra space (getting a dinner reservation for a large group in the summer in Vancouver is a bit challenging), we will additionally invite others such as our outstanding paper authors. Goal: 13-15 people (incl. 5 workshop organizers).

Funding: $100 USD per person all inclusive.

Any extra funds will be used to support need-based travel grants for first authors of accepted papers to attend in person and present their work. ML conferences tend to be located in North America in expensive cities and charge high registration fees in USD. This limits who can attend and particularly makes it cost-prohibitive for students, junior researchers without an established affiliation, and those from developing countries. We have already received 20 requests asking for >$22k in support.

Who is on your team?

See details on our workshop website: https://actionable-interpretability.github.io/

Comments1Offers1Similar7
Dhruv712 avatar

Dhruv Sumathi

AI For Humans Workshop and Hackathon at Edge Esmeralda

Talks and a hackathon on AI safety, d/acc, and how to empower humans in a post-AGI world.

Science & technologyTechnical AI safetyAI governanceBiosecurityGlobal catastrophic risks
1
0
$0 raised
GeorgLange avatar

Georg Lange

Travel Funding to present research at ICLR2025

Science & technologyTechnical AI safety
4
0
$0 raised
hzh avatar

Zhonghao He

Mapping neuroscience and mechanistic interpretability

Surveying neuroscience for tools to analyze and understand neural networks and building a natural science of deep learning

Technical AI safety
5
9
$5.95K raised
jesse_hoogland avatar

Jesse Hoogland

Scoping Developmental Interpretability

6-month funding for a team of researchers to assess a novel AI alignment research agenda that studies how structure forms in neural networks

Technical AI safety
13
11
$145K raised
evzen avatar

Evžen Wybitul

Travel funding to present a poster at the ICML technical AI governance workshop

Technical AI safetyAI governance
1
0
$0 / $2.5K
Chris-Lakin avatar

Chris Lakin

Conceptual Boundaries Workshop (already funded, but some additional things)

Technical AI safetyGlobal catastrophic risks
3
2
$0 raised
🐝

Sahil

[AI Safety Workshop @ EA Hotel] Autostructures

Scaling meaning without fixed structure (...dynamically generating it instead.)

3
7
$8.55K raised