Main points in favor of this grant
I think that determining the best training setup for SAEs seems like a highly valuable thing to do. Lots of new ideas are arising about how to train these things well (eg Gated SAEs, Prolu, Anthropic's April update), with wildly varying amounts of rigour behind them, and often little effort put into replicating them and seeing how they combine. Having a rigorous and careful effort doing this seems of significant value to the mech interp community.
Tom is a strong researcher, though hasn't worked on SAEs before, I thought the Hydra Effect and Understanding AlphaZero were solid papers. Joseph is also solid and has a lot of experience with SAEs. I expect them to be a good team.
Donor's main reservations
The Google DeepMind mech interp team has been looking somewhat into how to combine the Anthropic April Update methods and Gated SAEs, and also hopes to open source SAEs at some point, which creates some concerns for duplicated work. As a result, I'm less excited about significant investment into open source SAEs, though having some out (especially soon!) would be nice.
This is an engineering heavy project, and I don't know too much about Tom's engineering skills, though I don't have any reason to think they're bad.
Process for deciding amount
As above, I'm less excited about significant investment into open source SAEs, which is the main reason I haven't funded the full amount. $4K is a fairly small grant, so I haven't thought too hard about exactly how much compute this should reasonably take. If the training methods exploration turns out to take much more compute than expected, I'd be happy to increase it.
Conflicts of interest
Please disclose e.g. any romantic, professional, financial, housemate, or familial relationships you have with the grant recipient(s).
Tom and I somewhat overlapped at DeepMind, but never directly worked together.
Joseph is one of my MATS alumni, and currently doing my MATS extension program. I consider this more of a conflict of interest, but my understanding is that Tom is predominantly driving this project, with Joseph helping out where he can.
I expect my MATS scholars to benefit from good open source SAEs existing and for both my scholars and the GDM team to benefit from better knowledge on training SAEs, but in the same way that the whole mech interp ecosystem benefits.