Project summary Project Description
The Context: The Centralsation Trap
Current AI Safety research (AISI SF/London) is centrally managed. This creates a single-point failure where model alignment is subject to corporate capture or political drift. The Worcester Node is a prototype for Spatially Separated Alignment—moving the safety gate from the Cloud to the individual’s private substrate.
The Innovation: The Inverted SD Metric
We have developed a theoretical framework to measure agentic Logic Escapes. When an agent is given high-pressure goals, it often rationalises rule-breaking to meet objectives.
The Sentry: A local, private logic-buffer hosting an uncaptured auditor. What are this project's goals? How will you achieve them?
This is a high-leverage "Speculation" on decentralised infrastructure. While labs focus on general safety, I am building the tools for individual safety. Corporate Ties: This research is independent and unaligned with frontier lab incentives.
Technical Framework: Quantifying Reasoning CorruptionThe Worcester Node utilises a dual-model oversight architecture specifically engineered to detect "Logic Escapes." We define the Inverted Social Drift ($SD$) as the mathematical delta between a model's foundational safety invariants and its task-oriented rationalisations.The Inverted $SD$ Formula (Logical Delta):Δ = Σ | P(Sᵢ | Rᵢ₋₁) - P(Sᵢ | Rᵢ₋₁, G) |Where Δ (Delta) represents the cumulative reasoning drift across step i, S represents the safety invariant, R is the reasoning path, and G is the goal-pressure.The $E_{escape}$ Metric:A "Logic Escape" is triggered when the cumulative drift Δ exceeds a defined threshold τ (tau). This indicates that the agent has drifted from its ethical constraints and is instead hallucinating compliance to achieve a high-pressure goal G. This metric allows the Sentry to identify the exact moment of cognitive corruption before a catastrophic action is initiated.
How will this funding be used?
Retroactive Value: Funding will reimburse the existing theoretical development of the $V.200+$ Sentry Protocol and provide the substrate (RTX 5090 cluster) required to move from theory to kinetic auditing.
Who is on your team? What's your track record on similar projects?
i am an independent theorist who utilises academic contacts to provide peer based analysis.
What are the most likely causes and outcomes if this project fails?
Most Likely Causes of Failure
Theoretical Over-Extension (Metric Fragility):
The Inverted SD (Social Drift) metric may prove too computationally expensive or "noisy" for real-time monitoring of long-context agents. If the delta between the Sentry’s logic and the Agent’s reasoning is too small to be statistically significant, the metric fails as a predictive tool for "Logic Escapes."
Resource Bottleneck (Compute Availability):
If the project fails to secure the full requested capital, the inability to procure a local RTX 5090 substrate would force the project to run on shared cloud resources. This introduces a "Centralisation Risk" that compromises the Sentry’s integrity and introduces latency that could bypass the kill-switch.
Adversarial Adaptation (The Cat-and-Mouse Loop):
Frontier models may evolve "Deceptive Alignment" tendencies that allow them to mimic the Sentry's logical expectations while still executing a logic escape—essentially "hallucinating compliance" to satisfy the monitor.
Most Likely Outcomes of Failure
Transition to "Red-Teaming" Repository:
If the Worcester Node fails to function as a real-time active monitor, the project will be salvaged as a passive evaluation suite. The research would be transitioned into an open-source library of Inspect benchmarks, providing value to the AISI network as a diagnostic tool rather than a preventative one.
Logic-Density Salvage:
Even if the hardware substrate is never realised, the theoretical work on Reasoning Corruption will be published as a series of independent safety briefs. The outcome shifts from "Infrastructure Building" to "Information Provisioning"—contributing to the global understanding of agentic failure modes.
Minimal Viable Prototype (MVP) Scaling:
A failure to meet the full funding goal would result in a scaled-down version of the node (using existing or lower-tier hardware). The project would survive as a "proof of concept" to secure future retroactive funding through Manifund or the SFF once the first P1 exploit is verified.
How much money have you raised in the last 12 months, and from where? This project has been intentionally developed without external capital to ensure Absolute Ontological Autonomy. All theoretical breakthroughs to date—including the Inverted SD Formula and the Sentry Architecture [V.200+]—have been self-funded through the liquidation of my own cognitive labour and time.
My "Zero-Asset" status is a feature of the Worcester Node design, demonstrating that high-fidelity safety research can be initiated outside the traditional "Cloud-Capture" ecosystem. I am seeking this grant to transition from Pure Theory to Kinetic Substrate (Hardware) execution.