The Interpretability On-Ramp: A Comprehensive Guide for New Researchers

Project summary

1. What are this project's goals? How will you achieve them?

The primary goal of this project is to create a high-quality, accessible, and practical curriculum for newcomers to the field of mechanistic interpretability (MI). This will be achieved by documenting my own research journey, turning my findings into a series of tutorials, and culminating in a small, novel research contribution. This project will lower the barrier to entry for aspiring MI researchers, helping to grow the talent pool in this critical area of AI safety. The entire output will be open-source.

The project is structured in four phases over 11 months:

Phase 1: The Fundamentals (Months 1-3): I will build and explain foundational concepts from scratch, delivering a series of blog posts (e.g., "Transformers from Scratch") with heavily commented code notebooks.
Phase 2: Core MI Techniques (Months 4-7): I will replicate and explain the results of foundational MI papers, implementing techniques like causal tracing and path patching to understand known circuits.
Phase 3: Exploring the Frontier (Months 8-10): I will apply the learned techniques to a newer open-source model and a less-studied behavior to produce a small, novel research finding.
Phase 4: Synthesis and Curation (Month 11): I will package all work into a cohesive, easy-to-navigate curriculum, with a central GitHub repository and a final "Roadmap" article.

2. How will this funding be used?

This grant will enable me to dedicate myself to this project full-time. The total funding goal is $36,500.

Hardware Purchase: $8,000
- To purchase a high-performance local machine (e.g., a MacBook Pro with M-series chip). This is a strategic, one-time investment to significantly accelerate the project. While large-scale experiments will run on the cloud, all local development, debugging, smaller-scale experiments, and content creation (writing, screen recording) will be done on this machine. This will drastically improve productivity and iteration speed.
Researcher Stipend: $22,000
- To reflect my commitment to this project's success, I have allocated a significant portion of my potential stipend towards this necessary hardware. This revised stipend of $22,000 will cover essential living costs for the 11-month duration (approx. $2,000/month).
Cloud Compute Costs: $6,000
- This budget remains essential for renting powerful cloud GPUs (e.g., A100s) for the large-scale experiments in Phase 3, which are beyond the capabilities of any local machine.
Overhead: $500
- To cover miscellaneous costs such as software subscriptions or website hosting.

3. Who is on your team? What's your track record on similar projects?

I am Sonu Babu, the sole researcher on this project. I am a Computer Science postgraduate student (Integrated MCA, 5-year program) at Rajagiri College of Social Sciences with strong foundations in algorithms and AI. My academic work is complemented by a deep, self-directed focus on AI safety. I am an active participant in the AI Safety Fundamentals courses (both Alignment and Governance tracks). My primary focus is on mechanistic interpretability, where I have been independently studying materials from Neel Nanda, Anthropic's Circuits threads, and the ARENA curriculum. I am also actively learning PyTorch and exploring GPU compute frameworks to prepare for hands-on research.

My practical experience includes founding Entropy-com, where I built end-to-end AI tools like a chatbot and summarizer. More recently, as a Builder at Market01, I developed a Bloomberg-style GPU analytics terminal using the Vast.ai API, giving me direct experience with monitoring compute resources. My competencies in Independent Study, Technical Writing, and Problem-Solving are well-suited for this project's goals.

GitHub: github.com/itsfingerlickinggood
LeetCode: leetcode.com/u/sonuipad05

4. What are the most likely causes and outcomes if this project fails?

Risk 1: Slower-than-expected progress. The project is modular, so even if only 75% of the planned content is completed, it would still result in a valuable multi-part introduction to MI. I will prioritize quality over quantity.
Risk 2: The novel research phase yields null results. A detailed post about a "failed" investigation, including the methods tried and challenges faced, would itself be an extremely valuable learning resource for other beginners, and thus still a successful outcome for the project's goals.
Risk 3: Hardware Dependency. The revised plan relies on a single local machine for development. A critical hardware failure could cause delays. All work will be version-controlled and continuously backed up to remote repositories to mitigate this risk.

5. Explanation of Funding Tiers

With the minimum funding of $19,000, I can complete a 6-month version of this project, covering the foundational curriculum (Phases 1 and 2). This plan would rely on my existing hardware and the cloud compute budget, making the development cycle slower.

The full funding goal of $36,500 allows for the complete 11-month project. Crucially, it enables the strategic purchase of a high-performance local machine, which will accelerate all aspects of development and content creation. This efficiency gain makes the novel research in Phase 3 feasible and ensures a higher quality final output.

6. How much money have you raised in the last 12 months, and from where?

I have not raised any money for this project in the last 12 months. This would be the first grant supporting this research direction.