Advancing Scalable Oversight, by finding the best ways to use the complementary strengths of humans and AI to identify harm in conversations. Funding for the Spring 2026 SPAR project https://sparai.org/projects/sp26/recu4ePI8o6thONSs.
Datasets: Make a dataset of tasks that represent the task of identifying harm in AI-conversation in a wide range of domains (health, computer-control, etc.), where baseline Humans and AIs get <70% accuracy. We will collect these datasets from existing literature (building from our work in the Fall), and creating new datasets via techniques like tampers.
Methods: Develop methods for Complementarity on those datasets. We will build and improve our confidence-calculation, hybridization, and sub-task-delegation methods we developed in the Fall, and introduce new assistance methods.
Platform: Develop a Human Rating platform that fixes a lot of the issues in existing Human Rating platforms used in academia. This is worth the time investment for our project-alone, and we also aim to allow all other projects collecting Human Ratings to use this for free, making it easier to incorporate humans in the loop.
$20k from SPAR for the Spring 206 round. $3.5k of that will go to inference costs, $500 will go to platform hosting costs, and $16k will go to human experiment costs.
For the previous Fall 2025 project, we received $16k from SPAR, and $2k from me (Rishub). 90% of these costs went to human experiments with Prolific, and 10% went to inference costs.
Over the next 2 months remaining of the SPAR project (+1-3 months of potential wrap-up work), the funding will be used for:
$6k for inference costs, to try larger models and more confidence-calculation techniques
$15k for human experiment costs
$5k for fine-tuning experiments on improving confidence-calculation
$4k for Claude Max (5x) for 2 months, for the ~half of mentees that don't have it yet.
We have a team of talented 25 advisors and mentees. I’m leading it, and have worked at GDM for ~7 years, spending 2 years on Scalable Oversight (paper), and the other 5 on other high-impact projects like AlphaFold 2 and 3.