Topics that will be covered in the workshop

Mechanistic interpretability has proven transformative in understanding Large Language Models (LLMs), revealing intricate computational patterns and internal mechanisms. However, this powerful approach remains underutilized in computer vision, despite its potential to uncover fundamental properties of visual processing in neural networks. We recognize this as a crucial opportunity to advance our understanding of vision models through mechanistic interpretability, potentially revealing novel insights about learned visual representations and emergent algorithms.

As AI models become more integrated in critical areas like healthcare, autonomous driving, and security, it is more important than ever to understand how and why models make their decisions. Such understanding is crucial to eliminate undesired phenomena like bias, harmful content, and hallucinations. While traditional explainable AI (XAI) techniques shed some light on models’ decisions with high-level abstractions, the emerging field of mechanistic interpretability adopts a scientific approach to discovering the inner workings of neural networks. Mechanistic interpretability aims to explain the behavior of neural networks by exploring internal representation and behaviors—what do internal representations encode? What is the internal algorithm that networks perform? Can we reverse-engineer neural networks? What intervention should be made on the internals in order to affect model outcomes globally? What are the tools needed to understand very large models? These analyses and applications go beyond the traditional post-hoc explanations of traditional XAI methods.

Although major effort has been put into mechanistically explaining neural networks, most of the interpretability research is now shifting towards understanding language models, leaving a gap in our understanding of vision models. We argue that mechanistic interpretability is a promising approach for understanding the basic blocks and emergent algorithms of visual intelligence, as evidenced by a plethora of work [1-20]. Moreover, the growth of interest in vision-and-language models and multimodal models highlights the need for unified interpretability tools for different domains. Therefore, in this workshop, we would like to foster a discussion about interpretability for vision and multimodal models.

Topics include but are not limited to:

Visualizing and Understanding Internal Components of Vision and Multimodal Models:

This involves developing methods for visualizing units of vision models such as neurons and attention heads (e.g., [1, 2, 3, 4]).

Scaling and Automating Interpretability Methods:

How can we scale interpretability methods to larger models and beyond toy datasets for practical applications? This includes developing toolkits and interfaces for practitioners like [5].

Evaluating Interpretability Methods:

This involves developing benchmarks and comparing interpretability methods (e.g., [6]).

Model Editing and Debiasing:

After developing methods for visualizing and understanding the internals of vision models, how can we causally intervene to change model behavior and make it more safe, less biased, and suited for specific tasks (e.g., [7, 8, 9])?

Identifying Failure Modes and Correcting Them:

Can we visualize the internals of models to find shortcomings of algorithms or architectures? How can we use these findings to improve design choices like [10].

Emergent Behavior in Vision and Multimodal Models:

Using interpretability techniques, what are intriguing properties that we can discover of large vision and multimodal models? Examples include entanglement of visual and language concepts in CLIP [11] or controllable linear weight subspaces in diffusion models [12].

Representation Similarity and Universality:

Several works have found the convergence of representations learned with different model architectures trained with different datasets, with different tasks and modalities [13, 14]. How can we characterize the similarity of these different models (e.g., [15]).

Understanding Vision Models with Language:

How can we develop methods to use language representations to explain visual representations (e.g., [16, 17, 18])?

In-Context Learning:

Language models have shown impressive zero-shot capabilities. How can we elicit similar responses from vision models (e.g., [19])?

Understanding the Role of Data and Model Behavior:

What role does data have on the algorithm? What are biases and properties we can extract from datasets (e.g., [20, 21])?

[1] Bau, David, et al. "Network dissection: Quantifying interpretability of deep visual representations." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[2] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer International Publishing, 2014.

[3] Goh, Gabriel, et al. "Multimodal neurons in artificial neural networks." Distill 6.3 (2021): e30.

[4] Chefer, Hila, Shir Gur, and Lior Wolf. "Transformer interpretability beyond attention visualization."

[5] Shaham, Tamar Rott, et al. "A multimodal automated interpretability agent." Forty-first International Conference on Machine Learning. 2024.

[6] Schwettmann, Sarah, et al. "Find: A function description benchmark for evaluating interpretability methods." Advances in Neural Information Processing Systems 36 (2024).

[7] Gandikota, Rohit, et al. "Erasing concepts from diffusion models." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[8] Ilharco, Gabriel, et al. "Editing models with task arithmetic." The Eleventh International Conference on Learning Representations.

[9] Kumari, Nupur, et al. "Ablating concepts in text-to-image diffusion models." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[10] Darcet, Timothée, et al. "Vision Transformers Need Registers." The Twelfth International Conference on Learning Representations.

[11] Materzyńska, Joanna, Antonio Torralba, and David Bau. "Disentangling visual and written concepts in CLIP." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

[12] Dravid, Amil, et al. "Interpreting the Weight Space of Customized Diffusion Models." arXiv preprint arXiv:2406.09413 (2024).

[13] Huh, Minyoung, et al. "Position: The Platonic Representation Hypothesis." Forty-first International Conference on Machine Learning.

[14] Dravid, Amil, et al. "Rosetta neurons: Mining the common units in a model zoo." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[15] Kornblith, Simon, et al. "Similarity of neural network representations revisited." International conference on machine learning. PMLR, 2019.

[16] Hernandez, Evan, et al. "Natural language descriptions of deep visual features." International Conference on Learning Representations. 2021.

[17] Gandelsman, Yossi, Alexei A. Efros, and Jacob Steinhardt. "Interpreting CLIP's Image Representation via Text-Based Decomposition." The Twelfth International Conference on Learning Representations.

[18] Gandelsman, Yossi, Alexei A. Efros, and Jacob Steinhardt. "Interpreting the Second-Order Effects of Neurons in CLIP.", arXiv preprint arXiv:2406.04341 (2024).

[19] Bar, Amir, et al. "Visual prompting via image inpainting." Advances in Neural Information Processing Systems 35 (2022): 25005-25017.

[20] Torralba, Antonio, and Alexei A. Efros. "Unbiased look at dataset bias." CVPR 2011. IEEE, 2011.

[21] Dunlap, Lisa, et al. "Describing differences in image sets with natural language." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

Broader impact statement

The workshop on mechanistic interpretability for vision is highly relevant to the broader computer vision community, extending beyond the CVPR audience, by addressing the cutting-edge challenge of explaining and guiding the mechanisms that emerge in vision models. As the computer vision community shifted from hand-crafted feature design to large-scale training-based approaches for tasks, the mechanisms in learned models became less transparent and therefore less trustworthy. Mechanistic interpretability presents an innovative approach for understanding how such models solve specific tasks and introduces a path for tackling a new range of vision tasks by repurposing learned and extracted mechanisms from these models. As large-scale pre-training becomes increasingly important and expensive, this approach has the potential of reshaping how computer vision systems should adapt and generalize to new tasks. By convening pioneers in mechanistic interpretability for computer vision, this workshop fosters collaborative discussions and knowledge sharing, contributing to the growth and maturation of mechanistic interpretability as a transformative paradigm in computer vision research and applications.

Ethical considerations around the topic (if any)

There are no specific ethical considerations. In fact, we believe advancing interpretability work has the potential to solve many of the current ethical considerations of the computer vision field as it will allow us to audit models for unsafe behavior like bias and hallucination.

Relationship to previous workshops

We are not aware of any past related workshops in computer vision conferences about mechanistic interpretability. Besides keynote speakers and paper submission tracks, our workshop will include a practical introductory tutorial, showcasing practical tools and standard programing techniques for interpretability of vision models.

While other workshops may include talks and submissions related to explainability (The 3rd Explainable AI for Computer Vision (XAI4CV) Workshop) fairness (The Fifth Workshop on Fair, Data-efficient, and Trusted Computer Vision), and responsible design (Responsible Generative AI Workshop), these workshops do not focus on analyzing the computational process in vision models. Our workshop is unique in that it focuses on questions related to internal mechanisms that emerge in vision models and the analysis of what algorithms these models learn, rather than using black-box approaches to explain a model behavior.

Interpretability workshops are often included as part of learning conferences (e.g., mi-icml24), where they are very popular. However, these workshops are mostly devoted to interpreting language models. Despite a growing interest in interpreting vision models, this research community is still relatively smaller compared to the language model interpretability community. We believe this stems from the fact that traditionally the field of natural language processing is closely related to safety, and safety concerns are commonly tackled in the production of large language models. Nevertheless, shortly (and even today), we see how vision is also incorporated into large models and becomes an integral part of many products. Therefore, we aim to foster interpretability research in the vision community.

Other workshops at CVPR 2025

We are not aware of other workshop submissions that are suitable for a mutual workshop track.

ORGANIZERS AND SPEAKERS

List of organizers.

Prof. Philip Torr (email, website, scholar)

Philip Torr did his PhD (DPhil) at the Robotics Research Group of the University of Oxford under Professor David Murray of the Active Vision Group. He worked for another three years at Oxford as a research fellow, and still maintains close contact as visiting fellow there. He left Oxford to work for six years as a research scientist for Microsoft Research, first in Redmond, USA in the Vision Technology Group, then in Cambridge, UK, founding the vision side of the Machine Learning and Perception Group. He then became a Professor in Computer Vision and Machine Learning at Oxford Brookes University, where he has brought in over one million pounds in grants as a principal investigator. Recently in 2013 he returned to Oxford as full professor where he has established the Torr Vision group and has brought in over five million pounds of funding. Philip Torr won several awards including the Marr prize (the highest honour in vision) in 1998. He is a Royal Society Wolfson Research Merit Award Holder. Recently, he together with members of his group have won several other awards including most recently an honorary mention at the NIPS 2007 conference for the paper P. Kumar, V. Kolmorgorov, and P.H.S. Torr, An Analysis of Convex Relaxations for MAP Estimation, In NIPS 21, Neural Information Processing Conference, 2007, and (oral) Best Paper at Conference for O. Woodford, P.H.S. Torr, I. Reid, and A.W. Fitzgibbon, Global Stereo Reconstruction under Second Order Smoothness Priors, In Proceedings IEEE Conference of Computer Vision and Pattern Recognition, 2008 . More recently he has been awarded best science paper at BMVC 2010 and ECCV 2010. He was involved in the algorithm design for Boujou released by 2D3. Boujou has won a clutch of industry awards, including Computer Graphics World Innovation Award, IABM Peter Wayne Award, and CATS Award for Innovation, and a technical EMMY. He then worked closely with this Oxford based company as well as other companies such as Sony on the Wonderbook project. He is a director of new Oxford based spin out OxSight.

Eli Shechtman (email, website, scholar)

Eli Shechtman is a Senior Principal Scientist in the Imaging Lab, Adobe Research. He received his B.Sc. in Electrical Engineering (cum laude) from Tel-Aviv University in 1996 and his MSc and PhD with honors in Applied Mathematics and Computer Science from the Weizmann Institute of Science in 2003 and 2007. He then joined Adobe and also shared his time as a post-doc at the University of Washington between 2007-2010. He has published over 100 academic publications. Two of his papers were chosen to be published as “Research Highlight” papers in the Communication of the ACM (CACM) journal. He served as a Technical Paper Committee member at SIGGRAPH 2013 and 2014, was an Area Chair at ICCV 2015, 2019 and 2021, CVPR 2015, 2017 and 2020, ECCV 2022 and an Associate Editor for TPAMI from 2016-2020. He has received several honors and awards, including the Best Paper prize at ECCV 2002, a Best Paper award at WACV 2018, a Best Paper Runner Up at FG 2020 and the Helmholtz “Test of Time Award” at ICCV 2017. His research is in the intersection of computer vision, computer graphics and machine learning. In particular, he is focusing on generative modeling and editing of visual data. Some of his research can be found in Adobe’s products such as Adobe Firefly, Photoshop’s Generative Fill, Remove tool and Content Aware Fill, Content Aware Fill for Video in After Effects, Upright in Lightroom and Characterizer in Character Animator as well as in more than 100 issued US patents. Eli leads a team of top researchers and research engineers.

Tamar Rott Shaham (email, website, scholar) Primary organizer

Tamar Rott Shaham is a postdoctoral fellow at CSAIL, MIT, in Antonio Torralba’s lab. Her research focuses on interpreting neural networks. She received her PhD from the Technion under the supervision of Prof. Tomer Michaeli. Tamar has received several awards, including the ICCV 2019 Best Paper Award (Marr Prize), the Google WTM Scholarship (2019), and the Schmidt Postdoctoral Award (2022).

During her postdoctoral studies at MIT, Tamar’s research focuses on developing automated interpretability techniques and evaluation methods for interpretability schemes.

Yossi Gandelsman (email, website, scholar) Secondary organizer

Yossi is a computer science PhD at UC Berkeley, advised by Alexei Efros, and a visiting researcher at Meta. Before that, he was a member of the perception team at Google Research (now Google-DeepMind). He completed his M.Sc. at Weizmann Institute, advised by Prof. Michal Irani. His research centers around deep learning, computer vision, and mechanistic interpretability.

Yossi’s research focuses on developing automated mechanistic interpretability approaches for understanding, steering and reverse-engineering vision models and multimodal models.

Joanna Materzyńska (email, website, scholar) is a PhD candidate in Computer Vision and Machine Learning at the Massachusetts Institute of Technology (MIT) under Prof. Antonio Torralba. She holds a Master's degree from Oxford University where she collaborated with Prof. Philip Torr and the Darrell lab at UC Berkeley. Joanna has contributed to generative AI projects for content creation at both Netflix and Adobe under a research internship program. Prior to her Master degree, she was an early employee at the AI startup 'Twenty Billion Neurons GmBH' which was purchased by Qualcomm in 2021. Joanna is known for her work on interpretability of deep generative models, with work that enables concepts to be disentangled or completely removed from trained models, with implications for artists rights to remove their work from AI training data and for copyright law. Joanna's previous work focused on video understanding, and action and gesture recognition. Joanna was also part of the team that developed the first simulator for training machine learning algorithms for self-driving applications.

Rohit Gandikota (email, website, scholar)

Rohit is a PhD student at Northeastern University, advised by Prof. David Bau working on knowledge understanding and controllability for diffusion and language models. His main research focuses on safety, interpretability and applied research. Before that, he was a scientist at the Indian Space Research Organization working on advanced remote sensing, disaster management systems and vision for extraterrestrial landing.

Amil Dravid (email, website, scholar)

Amil is a PhD student at UC Berkeley advised by Prof. Alexei Efros. He is a US Department of Energy Computational Science Graduate Fellow. He completed his BS in Computer Science at Northwestern University, during which he visited as an undergraduate researcher at Caltech. His research interests include understanding emergent properties of vision models and the science of deep learning.

Ashkan Khakzar (email, website, scholar)

Dr. Ashkan Khakzar is a postdoctoral researcher at the University of Oxford, where he works with Prof. Philip Torr, focusing on the interpretation, evaluation, and steering of multimodal foundation models. He completed his PhD at the Technical University of Munich (summa cum laude - Prof. Nassir Navab and Prof. Bernt Schiele), focusing on explaining vision neural networks. He has published in top conferences (CVPR, NeurIPS, ECCV, MICCAI, …). He led the organization of the EVAL-FoMo workshop at ECCV 2024.

Organizing team’s experience and background:

Information that supports the organizers’ ability to run a workshop, and appropriateness to a workshop on this topic, including brief bios for the main organizers.

Four of the junior organizers (Tamar, Joanna, Yossi and Ashkan), have organized workshops at past vision conferences:

All senior organizers have rich experience in organizing workshops during their academic careers.

List of invited speakers

Prof. Antonio Torralba (email, website, scholar, confirmed in-person talk) is the director of the AI+D Faculty, a Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT), former MIT director of the MIT-IBM Watson AI Lab, and the former inaugural director of the MIT Quest for Intelligence, a MIT campus-wide initiative to discover the foundations of intelligence. He is also a member of the Center for Brains, Minds and Machines. He received the degree in telecommunications engineering from Telecom BCN, Spain, in 1994 and the Ph.D. degree in signal, image, and speech processing from the Institut National Polytechnique de Grenoble, France, in 2000. From 2000 to 2005, he spent postdoctoral training at the Brain and Cognitive Science Department and the Computer Science and Artificial Intelligence Laboratory, MIT, where he is now a professor. Prof. Torralba is an Associate Editor of the International Journal in Computer Vision, and has served as program chair for the Computer Vision and Pattern Recognition conference in 2015. He received the 2008 National Science Foundation (NSF) Career award, the best student paper award at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in 2009, and the 2010 J. K. Aggarwal Prize from the International Association for Pattern Recognition (IAPR). In 2017, he received the Frank Quick Faculty Research Innovation Fellowship and the Louis D. Smullin ('39) Award for Teaching Excellence. In 2020 he won the PAMI Mark Everingham Prize and became AAAI Fellow. In 2021 he won the Inaugural CVPR Thomas Huang Memorial Prize.

Prof. Torralba has a long list of papers investigating to role of computations inside vision models including methods for network dissection, automated interpretability and benchmarks.

Prof. David Bau (email, website, scholar, confirmed in-person talk)

David Bau is an assistant professor at Northeastern University Khoury School of Computer Sciences. He is a pioneer on deep network interpretability and model editing methods for large-scale generative AI such as large language models and image diffusion models, and he is author of a textbook on numerical linear algebra. He has developed software products for Google and Microsoft, and he is currently leading an effort to create a National Deep Inference Fabric to enable scientific research on large AI models. He received his PhD from Massachusetts Institute of Technology, his MS from Cornell University, and his BA from Harvard University.

Prof. Michal irani (website, scholar, email, confirmed in-person talk)

Michal Irani is a Professor at the Weizmann Institute of Science, Israel, in the Department of Computer Science and Applied Mathematics. She received a B.Sc. degree in Mathematics and Computer Science from the Hebrew University of Jerusalem, and M.Sc. and Ph.D. degrees in Computer Science from the same institution. During 1993-1996 she was a member of the Vision Technologies Laboratory at the Sarnoff Research Center (Princeton). She joined the Weizmann Institute in 1997. Michal's research interests center around Computer-Vision, Image-Processing, Artificial-Intelligence and Video information analysis. Michal's prizes and honors include the David Sarnoff Research Center Technical Achievement Award (1994), the Yigal Alon three-year Fellowship for Outstanding Young Scientists (1998), the Morris L. Levinson Prize in Mathematics (2003), the Maria Petrou Prize (awarded by the IAPR) for outstanding contributions to the fields of Computer Vision and Pattern Recognition (2016), the Landau Prize in the field of Artificial Intelligence (2019), and the Rothschild Prize in the fields of Mathematics/Computer-Science/Engineering (2020). She also received the ECCV Best Paper Award in 2000 and in 2002, and was awarded the Honorable Mention for the Marr Prize in 2001 and in 2005. In 2017 Michal received the Helmholtz Prize – the “Test of Time Award” (for the paper “Actions as space-time shapes”).

Prof. Irani’s current research addresses the connections between internal representations of deep-learning vision models and the brain activity responsible for human vision.

Prof. Aleksander Madry (website, scholar, email, confirmed in-person/online talk)

Aleksander Madry is a Member of Technical Staff at OpenAI, the Cadence Design Systems Professor of Computing (on leave) at MIT, and the Director of the MIT Center for Deployable Machine Learning, as well as a Co-Lead of the MIT AI Policy Forum. His research aims to understand machine learning from a robustness and deployability perspective. Aleksander's work has been recognized with a number of awards, including a National Science Foundation CAREER Award, an Alfred P. Sloan Research Fellowship, an Association for Computing Machinery Doctoral Dissertation Award Honorable Mention, and a Presburger Award. He received his PhD from MIT in 2011. Prior to joining the MIT faculty, he spent time at Microsoft Research New England and on the faculty of the Swiss Federal Institute of Technology in Lausanne (EPFL).

Prof. Trevor Darrell (website, scholar, email, confirmed in-person talk)

is on the faculty of the CS and EE Divisions of the EECS Department at UC Berkeley. He founded and co-leads Berkeley’s Berkeley Artificial Intelligence Research (BAIR) lab, the Berkeley DeepDrive (BDD) Industrial Consortia, and the recently launched BAIR Commons program in partnership with Facebook, Google, Microsoft, Amazon, and other partners. He also was Faculty Director of the PATH research center at UC Berkeley, and led the Vision group at the UC-affiliated International Computer Science Institute in Berkeley from 2008-2014. Prior to that, Prof. Darrell was on the faculty of the MIT EECS department from 1999-2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996-1999, and received the S.M., and PhD. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988.

Prof. Darrell published multiple papers in the field of explainable AI and interpretability of multimodal models.

Prof. Alexei (Alyosha) Efros (email, website, scholar, confirmed panelist)

Alexei A. Efros is a professor at EECS Department at UC Berkeley, where he is part of the Berkeley Artificial Intelligence Research Lab (BAIR). The central goal of his research is to use vast amounts of unlabelled visual data to understand, model, and recreate the visual world around us. His research has been mainly in data-driven computer vision, as well as its projection onto computer graphics and computational photography. In the last five years, his lab has been at the forefront of reviving self-supervised learning. His other interests include human vision, visual data mining, robotics, and the applications of computer vision to the visual arts and the humanities.

Sonia Joseph (website, email, confirmed in-person tutorial)

Sonia is a PhD student in Computer Science at Mila, co-supervised by Dr. Blake Richards, and a visiting researcher at Meta on the JEPA team, co-supervised by Dr. Mike Rabbat. Her research focuses on the advancement and understanding of multimodal models, with an emphasis on vision and video model interpretability. Prior to starting her doctoral studies, Sonia worked at Janelia Research Institute with Dr. Carsen Stringer, comparing the mouse visual cortex to deep neural nets and optic flow nets, and as a machine learning engineer at AI startups in San Francisco. Sonia holds a master’s degree in Computer Science from McGill University and a bachelor’s degree from Princeton University. During her undergraduate studies, she conducted research at the Princeton Neuroscience Institute, where she wrote her thesis on predicting brain semantics using BERT embeddings.

Sonia is the main author and developer of PRISMA - a library for mechanistic interpretability of vision transformers.

Diversity

To reflect a diverse array of thoughts, our speakers come from both academia and industry. Our organizers and speakers originate from North America, Europe, Middle East, and Asia, and include both male and female computer scientists, from young students to senior professors. As a major part of our workshop goal, our lineup of speakers reflects research done from a diverse array of disciplines including Computer Vision, NLP, and Neuroscience.

The First Workshop on Mechanistic Interpretability for Vision

Donate