Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
6

Train great open-source sparse autoencoders

Technical AI safety
tmcgrath avatar

Tom McGrath

ActiveGrant
$4,025raised
$10,000funding goal

Donate

Sign in to donate

Project summary

tl;dr: determine the best currently-available training setup for SAEs and disseminate this knowledge. Train SAEs for steadily larger models (starting with GPT-2-small for MATS scholars) and then scale up as budget and time allows.

Project proposal doc with more details: https://docs.google.com/document/d/15X28EEHo7pM2CYkfZqk05A0MZi4ImvTSSVaC9wtFLyI/edit?usp=sharing

What are this project's goals and how will you achieve them?

  1. Determine good hyperparameters for sparse autoencoders for realistic LLMs by doing a comprehensive architecture and hyperparameter comparison.

  2. Use this knowledge to train a suite of high-quality SAEs for GPT-2-small, then scale up further as resources allow, targeting ~1B and ~8B models in sequence.

  3. Disseminate knowledge on SAE training through a technical report.

How will this funding be used?

Compute!

Who is on your team and what's your track record on similar projects?

Lead: Tom McGrath - former DeepMind interpretability researcher.

Collaborating: Joseph Bloom - owner of SAELens and contributor to Neuronpedia.

What are the most likely causes and outcomes if this project fails? (premortem)

Failure to replicate results obtained by major labs leading to low SAE performance.

What other funding are you or your project getting?

Tom McGrath: none, currently self-funding

My collaborators Joseph Bloom and Johnny Lin are funded to work on Neuronpedia.

Comments6Donations2Similar8
GlenTaggart avatar

Glen M. Taggart

Independent research to improve SAEs (4-6 months)

By rapid iteration on possible alternative architectures & training techniques

Technical AI safety
3
5
$55K raised
🐬

Lovis Heindrich

Understanding SAE features using Sparse Feature Circuits

Technical AI safety
3
2
$11K raised
Kunvar avatar

Kunvar Thaman

Exploring feature interactions in transformer LLMs through sparse autoencoders

Technical AI safety
9
4
$8.5K raised
MatthewClarke avatar

Matthew A. Clarke

Salaries for SAE Co-occurrence Project

Working title - “Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces”

Science & technologyTechnical AI safety
3
1
$0 raised
robertzk avatar

Robert Krzyzanowski

Scaling Training Process Transparency

Compute and infrastructure costs

Technical AI safety
3
4
$5.15K raised
🐯

Scott Viteri

Attention-Guided-RL for Human-Like LMs

Compute Funding

Technical AI safety
4
2
$3.1K raised
Hannes avatar

Hannes Thurnherr

Research on SAE-Feature-Circuits

I'm writing a paper on the connections between SAE features and how to use them to interpret and modify a models behaviour.

2
0
$0 raised
kacloud avatar

Alex Cloud

Compute for 4 MATS scholars to rapidly scale promising new method pre-ICLR

Technical AI safety
3
5
$16K raised