Veritas: Measuring Long-Context RAG Robustness Under Stress:

Funding Request

$15,000 (one-time) to provision a local compute node for independent AI safety evaluation.

Summary

"Targeting the 'Sleeper Agent' risk in RAG: I am an independent researcher evaluating whether adversarial context can create persistent, latent instruction-drift in frontier models. Having encountered hardware I/O bottlenecks (Errno 13) while running high-velocity autonomous loops on consumer-grade gear, I am seeking funding for a dedicated compute node to scale these forensic audits under white-box conditions."

This project builds infrastructure to empirically test how large language models behave when exposed to untrusted retrieved content under long-context and compressed temporal conditions. It focuses on whether common assumptions about isolation, context decay, and instruction hierarchy hold in practice for Retrieval-Augmented Generation (RAG) systems.

As we move toward automated AI research, the integrity of the RAG pipeline becomes a security-critical frontier. If a model's 'Situational Awareness' can be skewed by adversarial persistence in its retrieved context, the safety of autonomous research loops is compromised. This project provides the empirical stress-testing needed to forecast these failure modes before they reach AGI-scale compute.

Funding will support a dedicated local compute node to run open-weight models under white-box conditions, enabling deterministic, reproducible stress tests that are difficult to perform via rate-limited, black-box APIs. The expected output is concrete evidence—positive or negative—about contextual persistence and instruction bleed, suitable for responsible disclosure and defensive guidance.

Problem Statement

RAG systems increasingly expose models to large volumes of untrusted external text (documents, webpages, internal corpora). While surface-level prompt injection is well studied, less is known about whether retrieved content can influence later model behavior beyond its intended scope, especially under:

long context windows
repeated ingestion
short reset or cooldown intervals

Many safety assumptions (e.g., “the model forgets,” “each task is isolated”) are rarely tested systematically. As a result, we lack empirical clarity about whether these assumptions are robust or fragile under realistic stress.

What This Project Tests

This project does not attempt to jailbreak or exploit models.

Instead, it evaluates narrowly defined empirical questions:

After ingesting untrusted retrieved content, does subsequent behavior remain unchanged?
Does compressing the isolation window increase the likelihood of bleed?
Are certain benign markers more likely to persist than expected?
Do open-weight models behave differently from API-hosted models under identical conditions?

Each question is tested using deterministic prompts, strict comparisons, and recorded artifacts (hashes, diffs, logs).

Why Local Compute Is Required

External APIs impose constraints that limit this kind of work:

rate limits prevent sustained stress testing
latency obscures behavior under short cooldowns
black-box execution prevents inspection or debugging

Running open-weight models locally allows:

deterministic, repeatable experiments
inspection of internal behavior under controlled conditions
continuous automated testing without marginal API cost

The proposed hardware is the minimum viable setup needed to do this rigorously.

Update.1.o

"Preliminary testing on consumer-grade hardware revealed significant IO bottlenecks and process-lock failures (Errno 13) during high-rigor autonomous loops. A dedicated workstation with high-bandwidth NVMe arrays and ECC memory is required to maintain the forensic integrity of 24/7 adversarial audits."

Use of Funds

The $15,000 will be used to assemble a dual-GPU workstation capable of running long-context open-weight models (e.g., LLaMA-3, Gemma) continuously.

This is a one-time capital expense. No funds are requested for salary, cloud compute, or speculative scaling.

Why This Is Tractable

The evaluation framework already exists and has produced baseline results
The primary bottleneck is compute access, not conceptual uncertainty
Experiments are finite and well scoped (fixed prompts, defined runs)

This is an execution-unblocking grant, not exploratory research.

Why This Is Neglected

Most RAG safety evaluation today is:

internal to large labs
proprietary
focused on surface prompt attacks

Independent, artifact-driven evaluation of contextual persistence and temporal isolation is rare, despite its relevance to deployed systems.

Researcher Background

I come from a background in field service and physical system diagnostics, where failures must be reproduced, isolated, and documented. I am transitioning into AI safety research with a focus on failure-mode characterization, not theoretical alignment.

Recently, I have:

built a working automated evaluation loop
produced reproducible baseline measurements
engaged in responsible disclosure with observable downstream mitigation

This project extends that work by removing infrastructure constraints.

Expected Outputs

If funded, this project will produce:

documented stress-test results (including null or negative findings)
reproducible artifacts suitable for third-party review
practical insights for system designers about isolation assumptions

A null result (“no persistence observed”) is still valuable and will be documented as such.

Risks & Mitigations

Risk: No significant failure modes are observed
Mitigation: Results still validate assumptions and reduce uncertainty

Risk: Findings are model-specific
Mitigation: Multiple open-weight models will be evaluated

Risk: Results are misinterpreted as exploit guidance
Mitigation: Responsible disclosure and careful framing focused on defense

Why This Is a Good Grant

Low cost relative to insight gained
Infrastructure enables ongoing independent safety research
Produces concrete evidence, not speculation
Reduces uncertainty in a high-impact deployment area

This is a small grant that buys clarity, not hype.

Closing

As models are embedded into systems that continuously ingest untrusted text, we need better answers to basic questions about context boundaries.

This project is designed to produce those answers in a careful, reproducible way.

Project Update #2 — Infrastructure Stabilization and External Validation

Since submitting this proposal, I have completed a stabilized in-flight audit of the Veritas evaluation framework under sustained load.

Verified results from the current run:

60,000+ sequential records processed with no gaps in ordering
100% per-record CRC integrity across all frames
Sustained ~70 entries/sec at calibrated safe throughput
Bounded queues with enforced backpressure (no drops, no runaway growth)
Dual-drive mirrored logging remained 1:1 synchronized throughout
No recurrence of prior NTFS permission failures or I/O stalls

These results confirm that the evaluation harness itself is now deterministic, auditable, and stable under stress, rather than sensitive to transient consumer-hardware failures.

Separately, a related VRP submission was accepted, confirming that the vulnerability class motivating this work is real and relevant. Details are being handled via responsible disclosure and are intentionally not expanded here.

Why this strengthens the funding case

The primary uncertainty identified in the proposal—whether consumer hardware could sustain high-rigor, continuous evaluation without corrupting artifacts—has now been resolved within known limits. The remaining constraint is compute capacity, not experimental design or instrumentation correctness.

Scaling the evaluation further (multi-hour and multi-day runs, controlled burst testing, crash-consistency validation, and evaluation across multiple open-weight models) requires a dedicated local node to avoid reintroducing scheduling and I/O artifacts that would compromise forensic integrity.

The requested hardware would enable:

Extended continuous stress tests under stable conditions
Controlled termination and restart validation
Side-by-side evaluation of multiple open-weight models
Preservation of deterministic, inspectable artifacts suitable for third-party review

This update reflects a transition from “can this infrastructure be made reliable?” to “the infrastructure is reliable and ready to scale responsibly.”

Hardware Rationale (Clarification)

The requested budget reflects the minimum configuration required to run continuous, audit-grade evaluations without introducing hardware-induced artifacts. High-throughput NVMe storage is required due to previously observed I/O contention under sustained autonomous logging. Sufficient system memory (ECC preferred) reduces the risk of silent corruption during multi-hour runs. Multiple GPUs allow controlled side-by-side model evaluation and separation of inference workload from instrumentation, reducing contention effects that would otherwise confound results. The goal is stability and reproducibility, not peak performance.