Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Protecting GPU LLM inference from model-weight and IP theft

Science & technologyTechnical AI safetyEA communityGlobal catastrophic risks
🍄

Anthony Etim

ProposalGrant
Closes March 9th, 2026
$0raised
$35,000minimum funding
$80,000funding goal

Offer to donate

41 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

TL;DR

Frontier model weights are a high-value target. I’m building deployable detection and response for sustained, covert model-weight/IP exfiltration during GPU-hosted LLM inference—aimed at reducing capability proliferation pathways relevant to catastrophic risk.

Problem & why it matters (GCR relevance)

Model-weight theft is uniquely dangerous because it can bypass governance and compute controls, enabling replication and misuse of frontier capabilities at scale. GPU inference infrastructure is a high-leverage target due to (a) shared microarchitecture and (b) measurable emissions (power/EM/thermal/acoustic), which can be exploited for covert leakage. The key asymmetry is that successful weight theft likely requires long-duration, sustained exfiltration—which makes detection-first defenses unusually promising (“detect once to disrupt”).

What are this project's goals? How will you achieve them?

I will build a detection and containment framework that can run continuously in production settings.

Workstream A — Threat taxonomy & measurable targets

  • Define practical threat models (insider, co-tenant, supply-chain, remote) and deployment assumptions telemetry availability).

  • Identify the most plausible side/covert channels for model-weight/IP exfiltration in GPU inference settings.

  • Define evaluation metrics: exfiltration bandwidth bounds, detection latency, false positive rate, and deployment overhead.

Workstream B — Empirical leakage benchmarking

  • Implement measurement experiments to estimate leakage curves and capacity upper bounds under realistic serving configurations.

  • Output: “what actually leaks, at what rate, under which assumptions” (decision-relevant for operators).

Workstream C — Always-on detector (two-stage)

  • Stage 1 (always-on): lightweight telemetry fusion over stable features (GPU utilization patterns, perf counters if available, scheduling/batching signatures, memory behavior proxies).

  • Stage 2 (confirm-before-action): higher-precision confirmation (optionally including external sensing like power/EM) to keep the overall false positive rate extremely low.

  • Target: detector tuned for ultra-low false positives so it can be continuously deployed.

Workstream D — Containment playbooks

  • Translate detections into operational actions: alert → confirm → throttle/isolate → randomized defenses / re-keying / model shard rotation (depending on deployment).

  • Evaluate residual leakage under mitigations and measure throughput impact.


Testbed / feasibility

I will use an existing testbed with H100 PCIe, power capture instrumentation, and EM collection capability, enabling fast iteration on measurements and detector validation.

Milestones (12 months)

  • Months 1–2: Finalize threat model, metrics, and baseline serving setup; publish v0 taxonomy outline.

  • Months 3–5: Produce initial leakage measurements and first capacity bounds on priority channels.

  • Months 6–8: Release Stage-1 detector prototype and baseline evaluation (overhead, latency, FPR).

  • Months 9–10: Integrate Stage-2 confirmation path and validate “confirm-before-action” workflow.

  • Months 11–12: Package artifacts + operator playbooks; write up results; plan responsible disclosure where needed.

Success metrics (how I’ll measure progress)

  • Decision-relevant measurement: empirical leakage/capacity bounds for plausible channels in realistic inference settings.

  • Deployability: always-on detector overhead compatible with production serving constraints.

  • Reliability: extremely low false-positive regime (with confirm-before-action) and clear detection latency.

  • Actionability: containment playbooks that measurably reduce residual leakage.


Expected outputs

  • A public technical report / paper with empirical results and recommended defenses.

  • A prototype monitoring tool (and evaluation harness) usable by AI infra/security teams.

  • Operator playbooks for detection → response, plus a responsible disclosure plan if vulnerabilities are identified.

How will this funding be used?

Funding supports a final 12-month PhD completion period (June 1, 2026 – May 31, 2027) plus modest project costs:

  • Primary: PhD stipend to execute the work end-to-end.

  • Secondary: compute for experiments/evaluation; dissemination (conference travel, publication costs).

  • If partially funded: prioritize research time and core measurements/detector first; defer travel/publication.

Who is on your team? What's your track record on similar projects?

  • Mentor: Jakub Szefer - Associate Professor of Electrical and Computer Engineering at Northwestern University; works on hardware security and secure architectures.

    LinkedIn: https://www.linkedin.com/in/jakub-szefer/

    Google Scholar: https://scholar.google.com/citations?hl=en&user=NO1Je2kAAAAJ

    Mentor: Wenjie Xiong - Assistant Professor of Electrical and Computer Engineering at Virginia Tech; works on hardware security and trustworthy computing.

    LinkedIn: https://www.linkedin.com/in/wenjie-xiong-2a85a63a/

    Google Scholar: https://scholar.google.com/citations?hl=en&user=07UMduYAAAAJ

    Advisor: Gabriel Kulp - Technology and Security Policy Fellow at RAND; works on secure compute infrastructure and verification/governance mechanisms.

    RAND publications: https://www.rand.org/pubs/authors/k/kulp_gabriel.html

    Primary author: Anthony Etim - Yale PhD candidate (hardware security and ML systems); I will lead implementation, experiments, and analysis full-time.

    LinkedIn: https://www.linkedin.com/in/anthony-etim/

    Google Scholar: https://scholar.google.com/citations?hl=en&user=rLp18joAAAAJ

What are the most likely causes and outcomes if this project fails?

Likely failure modes

  • Practical leakage rates for certain channels are lower than expected under realistic noise/isolation.

  • Telemetry availability is insufficient in some production deployments.

  • Some mitigations impose unacceptable throughput/latency overhead.

If it “fails,” what still comes out

  • A defensible taxonomy and empirical bounds that help operators prioritize mitigations (even negative results are decision-relevant).

  • A telemetry-only baseline monitor and guidance on which signals are actually useful and when external sensing is worth it.

How much money have you raised in the last 12 months, and from where?

$0

CommentsOffersSimilar6
EGV-Labs avatar

Jared Johnson

Beyond Compute: Persistent Runtime AI Behavioral Conditioning w/o Weight Changes

Runtime safety protocols that modify reasoning, without weight changes. Operational across GPT, Claude, Gemini with zero security breaches in classified use

Science & technologyTechnical AI safetyAI governanceGlobal catastrophic risks
1
0
$0 raised
anthonyw avatar

Anthony Ware

Shallow Review of AI Governance: Mapping the Technical–Policy Implementation Gap

Identifying operational bottlenecks and cruxes between alignment proposals and executable governance.

Technical AI safetyAI governanceGlobal catastrophic risks
2
1
$0 raised
is-sky-a-sea avatar

Aditya Raj

6-month research funding to challenge current AI safety methods

Current LLM safety methods—treat harmful knowledge as removable chunks. This is controlling a model and it does not work.

Technical AI safetyGlobal catastrophic risks
2
0
$0 raised
Krishna-Patel avatar

Krishna Patel

Isolating CBRN Knowledge in LLMs for Safety - Phase 2 (Research)

Expanding proven isolation techniques to high-risk capability domains in Mixture of Expert models

Technical AI safetyBiomedicalBiosecurity
4
4
$150K raised
Capter avatar

Furkan Elmas

Exploring a Single-FPS Stability Constraint in LLMs (ZTGI-Pro v3.3)

Early-stage work on a small internal-control layer that tracks instability in LLM reasoning and switches between SAFE / WARN / BREAK modes.

Science & technologyTechnical AI safety
1
2
$0 raised
🍓

James Lucassen

More Detailed Cyber Kill Chain For AI Control Evaluation

Extending an AI control evaluation to include vulnerability discovery, weaponization, and payload creation

Technical AI safety
4
4
$0 raised