3 months full-time contributing software to Inspect

3 months full-time contributing software to Inspect.

Goals:

How I'll achieve them:

Concrete ways:
- Port benchmarks not yet in Inspect (e.g. TheAgentCompany, RE-Bench, and MLGym).
- Develop Python packages implementing collections of Inspect solvers, tools, and scorers (e.g. like Inspect Cyber).
- Implement realistic test environments like WebArena for testing a wider range of agent scenarios in contained settings.
- Build tools for analyzing log files/reviewing transcripts to identify reasons for failure (if Docent isn’t doing all of this).
- Build tools for presenting collections of results in dashboards (i.e. contribute to https://github.com/ArcadiaImpact/inspect_evals_dashboard).
- Build tools for LM agents to use (e.g. search through https://github.com/aorwall/moatless-tools for tools which help/might be useful and build them in Inspect).
Default ways/in general:
- Try to complete open issues in Inspect repos.
- Ask the developers in the Inspect Slack workspace how to contribute.

This is meant to replace as much of my salary in industry as possible (which would mean about $15,000 per month).

Just me. I maintain open source projects like SAELens, neuronpedia, and SAEDashboard.

Causes:

I don't:
- Ramp up on the codebase fast enough.
- Have enough work for 1 month full-time.

Outcomes:

I don't:
- Make any significant improvements to Inspect.
- Add many evals to Inspect.
- Know my fit as a software engineer for evals.
- Build career capital.

None.

Donate