Protecting GPU LLM inference from model-weight and IP theft

Project summary

TL;DR

Frontier model weights are a high-value target. I’m building deployable detection and response for sustained, covert model-weight/IP exfiltration during GPU-hosted LLM inference—aimed at reducing capability proliferation pathways relevant to catastrophic risk.

Problem & why it matters (GCR relevance)

Model-weight theft is uniquely dangerous because it can bypass governance and compute controls, enabling replication and misuse of frontier capabilities at scale. GPU inference infrastructure is a high-leverage target due to (a) shared microarchitecture and (b) measurable emissions (power/EM/thermal/acoustic), which can be exploited for covert leakage. The key asymmetry is that successful weight theft likely requires long-duration, sustained exfiltration—which makes detection-first defenses unusually promising (“detect once to disrupt”).

What are this project's goals? How will you achieve them?

I will build a detection and containment framework that can run continuously in production settings.

Workstream A — Threat taxonomy & measurable targets

Define practical threat models (insider, co-tenant, supply-chain, remote) and deployment assumptions telemetry availability).
Identify the most plausible side/covert channels for model-weight/IP exfiltration in GPU inference settings.
Define evaluation metrics: exfiltration bandwidth bounds, detection latency, false positive rate, and deployment overhead.

Workstream B — Empirical leakage benchmarking

Implement measurement experiments to estimate leakage curves and capacity upper bounds under realistic serving configurations.
Output: “what actually leaks, at what rate, under which assumptions” (decision-relevant for operators).

Workstream C — Always-on detector (two-stage)

Stage 1 (always-on): lightweight telemetry fusion over stable features (GPU utilization patterns, perf counters if available, scheduling/batching signatures, memory behavior proxies).
Stage 2 (confirm-before-action): higher-precision confirmation (optionally including external sensing like power/EM) to keep the overall false positive rate extremely low.
Target: detector tuned for ultra-low false positives so it can be continuously deployed.

Workstream D — Containment playbooks

Translate detections into operational actions: alert → confirm → throttle/isolate → randomized defenses / re-keying / model shard rotation (depending on deployment).
Evaluate residual leakage under mitigations and measure throughput impact.

Testbed / feasibility

I will use an existing testbed with H100 PCIe, power capture instrumentation, and EM collection capability, enabling fast iteration on measurements and detector validation.

Milestones (12 months)

Months 1–2: Finalize threat model, metrics, and baseline serving setup; publish v0 taxonomy outline.
Months 3–5: Produce initial leakage measurements and first capacity bounds on priority channels.
Months 6–8: Release Stage-1 detector prototype and baseline evaluation (overhead, latency, FPR).
Months 9–10: Integrate Stage-2 confirmation path and validate “confirm-before-action” workflow.
Months 11–12: Package artifacts + operator playbooks; write up results; plan responsible disclosure where needed.

Success metrics (how I’ll measure progress)

Decision-relevant measurement: empirical leakage/capacity bounds for plausible channels in realistic inference settings.
Deployability: always-on detector overhead compatible with production serving constraints.
Reliability: extremely low false-positive regime (with confirm-before-action) and clear detection latency.
Actionability: containment playbooks that measurably reduce residual leakage.

Expected outputs

A public technical report / paper with empirical results and recommended defenses.
A prototype monitoring tool (and evaluation harness) usable by AI infra/security teams.
Operator playbooks for detection → response, plus a responsible disclosure plan if vulnerabilities are identified.

How will this funding be used?

Funding supports a final 12-month PhD completion period (June 1, 2026 – May 31, 2027) plus modest project costs:

Primary: PhD stipend to execute the work end-to-end.
Secondary: compute for experiments/evaluation; dissemination (conference travel, publication costs).
If partially funded: prioritize research time and core measurements/detector first; defer travel/publication.

Who is on your team? What's your track record on similar projects?

Mentor: Jakub Szefer - Associate Professor of Electrical and Computer Engineering at Northwestern University; works on hardware security and secure architectures.
LinkedIn: https://www.linkedin.com/in/jakub-szefer/
Google Scholar: https://scholar.google.com/citations?hl=en&user=NO1Je2kAAAAJ
Mentor: Wenjie Xiong - Assistant Professor of Electrical and Computer Engineering at Virginia Tech; works on hardware security and trustworthy computing.
LinkedIn: https://www.linkedin.com/in/wenjie-xiong-2a85a63a/
Google Scholar: https://scholar.google.com/citations?hl=en&user=07UMduYAAAAJ
Advisor: Gabriel Kulp - Technology and Security Policy Fellow at RAND; works on secure compute infrastructure and verification/governance mechanisms.
RAND publications: https://www.rand.org/pubs/authors/k/kulp_gabriel.html
Primary author: Anthony Etim - Yale PhD candidate (hardware security and ML systems); I will lead implementation, experiments, and analysis full-time.
LinkedIn: https://www.linkedin.com/in/anthony-etim/
Google Scholar: https://scholar.google.com/citations?hl=en&user=rLp18joAAAAJ

What are the most likely causes and outcomes if this project fails?

Likely failure modes

Practical leakage rates for certain channels are lower than expected under realistic noise/isolation.
Telemetry availability is insufficient in some production deployments.
Some mitigations impose unacceptable throughput/latency overhead.

If it “fails,” what still comes out

A defensible taxonomy and empirical bounds that help operators prioritize mitigations (even negative results are decision-relevant).
A telemetry-only baseline monitor and guidance on which signals are actually useful and when external sensing is worth it.