The First Actionable Interpretability Workshop at ICML 2025

Project summary

The workshop on Actionable Interpretability@ICML2025 aims to foster discussions on leveraging interpretability insights to drive tangible advancements in AI across diverse domains. We welcome contributions that move beyond theoretical analysis, demonstrating concrete improvements in model alignment, robustness, and real-world applications. Additionally, we seek to explore the challenges inherent in translating interpretability research into actionable impact.

Our areas of interest include:

Practical applications of interpretability insights to address key challenges in AI such as hallucinations, biases, and adversarial robustness, as well as applications in high-stakes, less-explored domains like healthcare, finance, and cybersecurity.
Comparative analyses of interpretability-based approaches versus alternative techniques like fine-tuning, prompting, and more.
New model architectures, training paradigms or design choices informed by interpretability findings.
Incorporating interpretability–often focusing on micro-level decision analysis–into more complex scenarios, like reasoning processes or multi-turn interactions.
Developing realistic benchmarking and assessment methods to measure the real-world impact of interpretability insights, particularly in production environments and large-scale models.
Critical discussions on the feasibility, limitations, and future directions of actionable interpretability research. We also invite perspectives that question whether actionability should be a goal of interpretability research.

We have additionally invited a diverse array of keynote speakers and panelists (8 total) from both academia and industry to discuss their efforts to make interpretability research actionable. We have 89 accepted papers, many of which make concrete contributions to technical AI safety research, on topics such as using interpretability to perform model steering, unlearning, and safety assurance/control at deployment-time. The information/resources (papers, posters, and talk recordings) will be posted on our website for public access.

How will this funding be used?

We plan to host a social dinner after the workshop. Invitations will first be extended to our invited speakers (4 people) and panelists (4 people) to thank them for their contributions to the workshop. If we have extra space (getting a dinner reservation for a large group in the summer in Vancouver is a bit challenging), we will additionally invite others such as our outstanding paper authors. Goal: 13-15 people (incl. 5 workshop organizers).

Funding: $100 USD per person all inclusive.

Any extra funds will be used to support need-based travel grants for first authors of accepted papers to attend in person and present their work. ML conferences tend to be located in North America in expensive cities and charge high registration fees in USD. This limits who can attend and particularly makes it cost-prohibitive for students, junior researchers without an established affiliation, and those from developing countries. We have already received 20 requests asking for >$22k in support.

Who is on your team?

See details on our workshop website: https://actionable-interpretability.github.io/