Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
2

Evaluating Model Attack Selection and Offensive Cyber Horizons

Technical AI safety
seanpetersau avatar

Sean Peters

ProposalGrant
Closes January 2nd, 2026
$41,000raised
$10,500minimum funding
$41,000funding goal
Fully funded and not currently accepting donations.

Project summary


I've been working on dangerous capability evaluation as my primary focus for 6 weeks, after receiving career transition funding three months ago and transitioning out of my previous role. In that time I've:

  • Pivoted from my original proposal to measure covert capability under control protocols. The scope was too expansive, and I ran into interesting adversarial dynamics that complicated the methodology. This has led into my interest in attack selection.

  • Developed, shared and begun iterating on an experimental and analytical pipeline to measure attack selection capability with control evaluations.

  • Began supervising Jack Payne, who is driving extensions of my previous cyber offense time horizons work. This extension includes; newer models, benchmarks, contracted cybersec experts for task estimation, etc.

  • Reached out to researchers in the field for early feedback, and attended AISUM25 and EAGxAustralasia in Melbourne to discuss the work

I'm seeking up to ~$41k through end of March 2026 to extend this work. Priorities:

  1. Research engineer support (continuing the cyber horizons extension)

  2. Operating expenses (compute, API credits, tooling)

  3. Specialist contracting (e.g. cybersecurity review, adversarial stats)

  4. Personal runway extension

The budget section breaks this into three tiers.

Background

I'm an experienced software engineer and team lead who's spent a decade working at the intersection of software and research — astronomy, microkernels, observability at scale, cancer proteomics, and most recently cultivated meat at Vow. This year I've progressively loaded myself into AI safety, with it becoming my primary focus since November.

My entry point was replicating METR's time-horizon methodology for offensive cybersecurity, followed by working on a proposal for a methodology to measure covert capability under control protocols.

That work got me career transition funding to make AIS my primary career focus (4 days/week).

Current Work

Track 1: Attack Selection Capability Evaluation (me)

Safety cases in AI control often assume models attack at a constant base rate. But a sophisticated attacker would be selective, waiting for favourable conditions rather than attacking indiscriminately. Current red-teaming work treats attack selection as a tactic to optimise — building better red teams. I'm approaching this from the perspective of evaluation: measuring attack selection as an emergent model capability. Can models already identify which situations favour attackers?

If models are developing this perception over time, that changes how we should think about safety cases. Tracking it provides situational awareness about an important sub-capability in model rogue behaviour.

I've built and am running the experimental, analytical pipeline, and iterating on the methodology. The current proposal and status has more detail. Expect it to evolve as I incorporate feedback.

Track 2: Cyber horizons extension (Jack Payne, with my supervision)

Jack and I are extending work I shared in June. Jack is leading the extension, addressing the study's main limitations: evaluations were single-shot, human time estimates leaned heavily on AI assistance rather than professional baselines, and the longer-horizon data came entirely from CTF competitions.

The extension adds broader task coverage (CVE-Bench, CyberGym,), more models, multiple runs per task-model pair, and contracting a security expert for grounded time estimates. The extension plans are available.

Beyond this grant, two directions excite us. First, specialised agent scaffolding: tools like Harmony Intelligence's AppSec agent (which recently found a real Next.js vulnerability) should outperform base models, but by how much? Second, offensive/defensive asymmetry dynamics, building on these capability measurements to understand how the balance shifts.

Budget

Seeking up to ~$41,000 USD through end of March 2026.

This builds on my previous career transition funding. What I propose here are now two concrete bodies of work, and this funding extends the scope and ambition of both.

The budget breaks into three tiers:

Tier 1 (~$11,300): Through end of January

Covers the immediate extension: research engineer support (part-time initially, ramping to full-time), increased operating expenses, and specialist contracting.

- Research engineer: $6,800

- Operating expenses: $1,500

- Specialist contracting: $3,000

Tier 2 (~$26,700): Through end of February

Adds one month of personal runway and continued research support. More time to improve quality, seek feedback, and iterate.

- Research engineer: $11,700

- Operating expenses: $3,000

- Specialist contracting: $4,000

- Personal compensation: $8,000

Tier 3 (~$41,000): Through end of March

If the core work lands by February, this is where we'd pursue extension ideas (specialized agent scaffolding, offensive/defensive asymmetry) or be thoughtful about next steps—rather than scrambling for the next round of funding.

- Research engineer: $16,500

- Operating expenses: $4,500

- Specialist contracting: $4,000

- Personal compensation: $16,000

This is a prospective budget. I'll spend where I think it's most valuable.

(Edit 1 day after approval: After further thought on compensation consistency, I've adjusted the budget. Net effect: my compensation reduced, research engineer and specialist contracting slightly increased.)

Risks

  • Geographic friction: Working from Australia, I've made good progress building local connections but need deeper international ties. Many leading experts in areas I'm working in are international. I'll be further prioritising outreach efforts here.

  • Statistics depth: Evaluation science uses deep statistics. For example, in the attack selection methodology I'm working with hierarchical Bayesian modeling, adaptive sampling, MCMC, etc. Many of these methods are new to me and I'm actively learning them, but this isn't something you ramp on in weeks. I'm utilizing my network for support, and potentially budgeting for short-term contracts to do adversarial review of my methods.

  • Diffuse focus: Multiple directions compete for attention: cyber work, attack selection, strategic thinking, open source contributions I'd like to make. I'm addressing this by cutting scope. For example, I'm not finding time for open source contributions, and that's probably fine.

  • Solo iteration: Working independently means fewer people challenging my ideas, which can slow directional correction even when output is high.

Previous Funding

  • Manifund (September 2025): $32,000 USD for 3 months career transition. Currently ~2 months of runway remaining.


Comments2Offers1Similar5
offering $41,000
joel_bkr avatar

Joel Becker

1 day ago

Donating on identical thesis to previous grant -- I wish I could have gone in for more size then!

I haven't thought about expenses for RE; from my perspective, I'm also thinking of this funding as a bet on Sean.

offering $41,000
joel_bkr avatar

Joel Becker

1 day ago

Evaluation science uses deep statistics. For example, in the attack selection methodology I'm working with hierarchical Bayesian modeling, adaptive sampling, MCMC, etc.

Noting that a lot of stats we use at METR is intentionally pretty simple. E.g. I think we've never done Bayes, despite having many members of staff who would feel intuitively on board with that. Partly this is for communication clarity, partly because it gives less surface area to mess up.