Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
3

3 Months Career Transition into AI Safety Research & Software Engineering

Technical AI safety
seanpetersau avatar

Sean Peters

ProposalGrant
Closes October 24th, 2025
$32,000raised
$10,000minimum funding
$32,000funding goal
Fully funded and not currently accepting donations.

Project summary

I’m seeking career-transition support to work 4 days/week in AI safety with a focus on dangerous capability evaluation and control.

My near-term research interest is in evaluation methodology to measure model covert capability time horizons against control protocol monitors. With a first exploration into cyber-offense covert tasks during ordinary software work.

I have a work-in-progress proposal that sketches a broader agenda. Early feedback has been positive but suggested a significant narrowing of scope, which I’m doing.

What are this project's goals? How will you achieve them?

North Star: Contribute to robust, decision-useful metrics of model covert capability and control protocols.

  • How much covert work models can do, while working on legitimate assigned tasks without detection, aligned to real risk profiles (e.g., loss-of-control)? 

  • How do Control Protocols change the scope and profile of model covert capabilities?

  • The time-horizon framing (“covert horizons”) is my guiding hypothesis here.

METR’s time-horizon methodology coupled with AI Control primitives provides us with a useful approach to measuring model covert capability performance – measuring the time horizons at which an untrusted model U ,can achieve a covert objective C, while completing a legitimate task T, without detection under a monitor M

Goals (examples, not commitments)

  • Build research expertise and early research outputs to contribute to this North Star.

    • Build a small, tightly specified dataset of tasks T and C, pairing ordinary software tasks with cyber-offense side goals.

    • Iterate on the dataset and time horizon methodology until they reliably elicit the target covert behaviour (finish the main task and complete the side goal without detection) and I have high-conviction evidence of soundness, measurement validity, and the intuition to explain why.

    • Run early pre-pilot human evaluation probes e.g. self-tests or N=1 contractor evaluations to sanity-check the above before considering anything larger.

  • Contribute to control and evaluation open-source software ecosystem e.g. ControlArena and Inspect

How I’ll work

  • Stay flexible and pivot toward the highest-leverage track (e.g., task creation, baselining, or even research agenda pivots)

  • Maintain regular contact with mentors/collaborators (AI control, model evaluation, cybersecurity, statistics)

  • Prioritise measurement validity over scale: keep dataset small and conditions limited; aim for high internal validity and repeatability within the “covert horizons” framing.

  • Being deliberately mindful of: elicitation and refusals/sandbagging; agent setup and scaffolding; token/context budgets; task-pairing dynamics; construct validity; and the unknown-unknowns that I anticipate will surface along the way.

How will this funding be used?

  • 3 months, total $32,000 USD, start Nov 1

  • Stipend: $30,000 (4 days/week; equipment/tooling folded in)

  • Model API Credits: $2,000

Who is on your team? What's your track record on similar projects?

Team: just me (Blue Mountains/Sydney, Australia)

Track record:

  • Over a decade of experience as a software engineer working at the intersection of a variety of complex scientific domains (cell culture, proteomics, microkernels, astronomy)

  • For the last four years I've worked at Vow, who have grown from a small-time Australian startup into the world leaders in cell cultured meat.

  • Early work related to my targeted research agenda include

    • AI Task Length Horizons in Offensive Cybersecurity [Complete]

    • Measuring Deceptive Capability with Human Time Horizons and Control Protocols [Draft project proposal]

Transparency

  • I have not worked in a full-time researcher role in a very long time. Always either a research engineer or research adjacent software engineer.

  • I have never worked in an independently funded capacity.

What are the most likely causes and outcomes if this project fails?

Causes

  • Isolation/time-zone: Working from (semi-rural) Australia makes it harder to stay plugged into the research loop. I may fail to engage well with collaborators and mentors.

  • Ramp-up risk: There are a number of domains that will require deepening of expertise and I may struggle; AI Control, cybersecurity, statistical methods

Mitigations

  • Regular and multiple mentor/collaborator catch-ups across AI control, eval, cybersecurity, and statistics.

  • A small weekly report mailing list for high-quality and interested readers who feel comfortable giving me very candid feedback!

Outcomes

  • I don’t ship useful research or code, and I don’t connect meaningfully with the community.

  • I would pivot back to safety-aligned software engineering roles.

  • If that isn’t viable, I would need to pivot out of AI safety

How much money have you raised in the last 12 months, and from where?

$0.

  • Pending: UK Alignment Fund (submitted)

  • Pending: Open Phil career-transition (in progress).

If this grant (or any other grant) meets my funding needs, I’ll coordinate/withdraw to avoid double funding.


Comments1Offers1Similar6
mfatt avatar

Matthew Farr

MoSSAIC

Probing possible limitations and assumptions of interpretability | Articulating evasive risk phenomena arising from adaptive and self modifying AI

Science & technologyTechnical AI safetyAI governanceGlobal catastrophic risks
1
0
$0 raised
SandyFraser avatar

Sandy Fraser

Concept-anchored representation engineering for alignment

New techniques to impose minimal structure on LLM internals for monitoring, intervention, and unlearning.

Technical AI safetyGlobal catastrophic risks
3
1
$0 raised
LawrenceC avatar

Lawrence Chan

Exploring novel research directions in prosaic AI alignment

3 month

Technical AI safety
5
9
$30K raised
CarlosGiudice avatar

Carlos Rafael Giudice

Cash runway while I go through interviews/wait for OpenPhil's grant decision

I've self funded my ramp up for six months and interview/grant processes are taking longer than expected.

Technical AI safetyGlobal catastrophic risks
2
0
$0 raised
🥥

Alex Lintz

Funding for AI safety comms strategy & career transition support

Mostly retroactive funding for prior work on AI safety comms strategy as well as career transition support. 

AI governanceLong-Term Future FundGlobal catastrophic risks
4
5
$39K raised
AmritanshuPrasad avatar

Amritanshu Prasad

General support for research activities

Funding for my work across AI governance and policy research

AI governanceGlobal catastrophic risks
3
3
$0 raised