Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
3

3 Months Career Transition into AI Safety Research & Software Engineering

Technical AI safety
seanpetersau avatar

Sean Peters

ProposalGrant
Closes October 24th, 2025
$32,000raised
$10,000minimum funding
$32,000funding goal
Fully funded and not currently accepting donations.

Project summary

I’m seeking career-transition support to work 4 days/week in AI safety with a focus on dangerous capability evaluation and control.

My near-term research interest is in evaluation methodology to measure model covert capability time horizons against control protocol monitors. With a first exploration into cyber-offense covert tasks during ordinary software work.

I have a work-in-progress proposal that sketches a broader agenda. Early feedback has been positive but suggested a significant narrowing of scope, which I’m doing.

What are this project's goals? How will you achieve them?

North Star: Contribute to robust, decision-useful metrics of model covert capability and control protocols.

  • How much covert work models can do, while working on legitimate assigned tasks without detection, aligned to real risk profiles (e.g., loss-of-control)? 

  • How do Control Protocols change the scope and profile of model covert capabilities?

  • The time-horizon framing (“covert horizons”) is my guiding hypothesis here.

METR’s time-horizon methodology coupled with AI Control primitives provides us with a useful approach to measuring model covert capability performance – measuring the time horizons at which an untrusted model U ,can achieve a covert objective C, while completing a legitimate task T, without detection under a monitor M

Goals (examples, not commitments)

  • Build research expertise and early research outputs to contribute to this North Star.

    • Build a small, tightly specified dataset of tasks T and C, pairing ordinary software tasks with cyber-offense side goals.

    • Iterate on the dataset and time horizon methodology until they reliably elicit the target covert behaviour (finish the main task and complete the side goal without detection) and I have high-conviction evidence of soundness, measurement validity, and the intuition to explain why.

    • Run early pre-pilot human evaluation probes e.g. self-tests or N=1 contractor evaluations to sanity-check the above before considering anything larger.

  • Contribute to control and evaluation open-source software ecosystem e.g. ControlArena and Inspect

How I’ll work

  • Stay flexible and pivot toward the highest-leverage track (e.g., task creation, baselining, or even research agenda pivots)

  • Maintain regular contact with mentors/collaborators (AI control, model evaluation, cybersecurity, statistics)

  • Prioritise measurement validity over scale: keep dataset small and conditions limited; aim for high internal validity and repeatability within the “covert horizons” framing.

  • Being deliberately mindful of: elicitation and refusals/sandbagging; agent setup and scaffolding; token/context budgets; task-pairing dynamics; construct validity; and the unknown-unknowns that I anticipate will surface along the way.

How will this funding be used?

  • 3 months, total $32,000 USD, start Nov 1

  • Stipend: $30,000 (4 days/week; equipment/tooling folded in)

  • Model API Credits: $2,000

Who is on your team? What's your track record on similar projects?

Team: just me (Blue Mountains/Sydney, Australia)

Track record:

  • Over a decade of experience as a software engineer working at the intersection of a variety of complex scientific domains (cell culture, proteomics, microkernels, astronomy)

  • For the last four years I've worked at Vow, who have grown from a small-time Australian startup into the world leaders in cell cultured meat.

  • Early work related to my targeted research agenda include

    • AI Task Length Horizons in Offensive Cybersecurity [Complete]

    • Measuring Deceptive Capability with Human Time Horizons and Control Protocols [Draft project proposal]

Transparency

  • I have not worked in a full-time researcher role in a very long time. Always either a research engineer or research adjacent software engineer.

  • I have never worked in an independently funded capacity.

What are the most likely causes and outcomes if this project fails?

Causes

  • Isolation/time-zone: Working from (semi-rural) Australia makes it harder to stay plugged into the research loop. I may fail to engage well with collaborators and mentors.

  • Ramp-up risk: There are a number of domains that will require deepening of expertise and I may struggle; AI Control, cybersecurity, statistical methods

Mitigations

  • Regular and multiple mentor/collaborator catch-ups across AI control, eval, cybersecurity, and statistics.

  • A small weekly report mailing list for high-quality and interested readers who feel comfortable giving me very candid feedback!

Outcomes

  • I don’t ship useful research or code, and I don’t connect meaningfully with the community.

  • I would pivot back to safety-aligned software engineering roles.

  • If that isn’t viable, I would need to pivot out of AI safety

How much money have you raised in the last 12 months, and from where?

$0.

  • Pending: UK Alignment Fund (submitted)

  • Pending: Open Phil career-transition (in progress).

If this grant (or any other grant) meets my funding needs, I’ll coordinate/withdraw to avoid double funding.


Comments1Offers1Similar6
offering $32,000
joel_bkr avatar

Joel Becker

about 10 hours ago

I asked Sean to put up this proposal and have fully funded his request.

The case for this grant is extremely straightforward: my sense (from ~1.5 years at METR) is that of all the independent external people building directly on top of METR work, Sean is the person whose work my colleagues have been most excited about. Sean appears to be smart, agentic, and focused. My sense is that he's working ~1.5 days per week on these projects basically due to funding constraints, which seems crazy; I'm very excited to support him increasing this to 4.

I could say some more specific things (excited for Sean's connections to pen testers, excited for him to contribute to sabotage monitoring projects, excited for his past work measuring time horizons in offensive cybersecurity , etc.) but mainly my decision comes down to my colleagues' rare degree of excitement for whatever Sean has in store next. I'm thrilled to bet on that.