Update Oct 24: the projects have been making exciting progress! There's more work to do, so I'm granting my remaining $200k to support it. Really excited about this!
Update from Ethan:
Some updates on how the last funding was used, and what future funding would be used for:
With the help of your last grant, two of our projects finished up and turned into (IMO) pretty exciting ICLR submissions -- one on investigating whether human feedback is at fault for sycophancy in language models, and another on extending RLHF to vision-and-language models (in the hopes of facilitating process-based training of vision-and-language models like GPT4+)
I've still got 3 projects in flight:
For one of these, we've gotten to the point where we've found a way to improve how representative chain of thought reasoning is of the model's actual process for solving tasks, which I think will be pretty helpful for both improving model transparency and also process-based training schemes; we'll probably have a paper on this in ~2 months
The other 2 projects (debate + model organisms of reward hacking) are in-flight and making good progress, and I'm optimistic that we'll have some interesting public results out in the 4 month timeframe (we already have some results that are interesting to discuss publicly, but probably want to do more work before starting to put together a paper)
I might start up new projects with winter MATS or other external-to-Anthropic collaborators, all of these could benefit from funding for OAI API credits
Our current runway is ~6 weeks, and we expect our compute expenses to go up a bit since we're slated to run compute-intensive experiments for the debate project