You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Current AI safety evaluation often identifies failures only after final behavior is visible, without clearly localizing where failure entered the decision chain, whether provenance broke, or whether approval gates were bypassed.
This project builds an open, deterministic oversight layer for agent traces by combining a structural trace protocol (LTP) with a causal memory graph (CML).
Within 90 days, I will release three public goods:
LTP-Bench v0.1: an adversarial trace benchmark with labeled safety-relevant failures,
LTP + CML reference library: open tooling for deterministic trace recording, replay, and structural analysis,
Evaluation report: a direct comparison of structural trace analysis vs behavioral-only safety evaluation baselines.
Core question: When does structural trace analysis materially outperform behavioral-only sampling for safety triage and failure localization?
Even if results are mixed, the benchmark, code, and negative-result analysis remain reusable public infrastructure.
Build and validate a deterministic oversight layer (LTP + CML) that flags safety-relevant failures inside agent traces across architectures.
Release LTP-Bench v0.1 as a reusable public benchmark corpus.
Produce a clear empirical answer on where structural trace signals outperform behavioral-only evaluation.
Adversarial traces across coding, tool-use, and research-assistant settings.
Labeled failure classes: hallucination, provenance violations, approval bypass, dangerous tool use, semantic drift, specification violations/deception.
Dataset card and labeling rubric.
Deterministic trace schema, recorder/replayer, structural checks.
Causal memory graph for cross-turn dependency tracking.
Integration examples and documentation.
Metrics, baselines, ablations, limitations, and reproducibility instructions.
Explicit statement of where the method works and where it does not.
I will compare three conditions:
Behavioral-only baseline (output sampling/scoring without structural trace checks)
LTP-only structural checks (without CML)
Full method: LTP + CML
Primary metrics:
Precision / recall / F1 by failure class
False-positive rate of structural flags
Time-to-localization of failure origin
Cross-framework transfer robustness
Inter-annotator agreement on labeled subsets
Secondary metrics:
Failure-class coverage
Stability under task/prompt perturbations
Cost per evaluated trace (compute + analyst effort proxy)
All metrics, labeling rubrics, and evaluation scripts will be published with reproducibility instructions to support independent replication.
Finalize trace schema and failure taxonomy
Integrate one agent framework with deterministic replay
Define baseline protocol and scoring scripts
Exit criteria:
End-to-end replay on fixture set
Draft taxonomy + labeling rubric
First baseline run completed
Implement CML graph logic and cross-turn checks
Build initial adversarial corpus and complete first labeling pass
Run first LTP-vs-behavioral comparisons
Exit criteria:
Labeled corpus reaches minimum viable size
Comparable metrics pipeline working end-to-end
Preliminary result tables produced
Expand benchmark coverage
Harden library interfaces and docs
Publish code, benchmark, and final comparative report
Exit criteria:
Public LTP-Bench v0.1 release
Reproducible evaluation package
Final report with limitations and negative-result analysis
I will deliver:
1 framework integration
A smaller labeled adversarial corpus
An initial oversight library prototype
An initial comparative public report
I will deliver:
At least 3 framework/architecture integrations
A substantially expanded adversarial corpus
A more polished library and documentation
A stronger comparative evaluation (including transfer + ablations)
$12,000 — stipend (3 months full-time execution)
Implementation, benchmarking, release engineering, documentation.
$5,000 — compute
LLM API/GPU usage for repeated adversarial evaluations and comparisons.
$3,000 — infrastructure + validation
Storage, annotation tooling, limited contractor support for labeling/validation, release polish.
This is a lean bridge budget focused on shipping reusable public artifacts.
Most likely risks:
Structural signals do not broadly outperform behavioral-only methods.
CML gains are narrow (task-dependent).
Integration complexity reduces architecture coverage.
Mitigation and value even under partial failure:
Publish class-specific results and clear boundary conditions.
Release benchmark, labels, code, and a negative-result protocol.
Provide reusable public testbeds so others can iterate faster and avoid dead ends.
I am a solo independent researcher with 12+ years in fintech QA/testing infrastructure and failure analysis, now focused on AI safety oversight and reproducible evaluation.
Relevant prior open-source work:
Causal-Memory-Layer — causal memory and accountability layer
L-THREAD-Liminal-Thread-Secure-Protocol-LTP- — deterministic replay protocol for trace continuity
CaPU — permission-first cause→commit→execute pipeline
DMP-decision-memory-protocol — decision-memory protocol for context, risk, and outcomes
My comparative advantage is execution: converting broad safety concerns into testable artifacts, reproducible fixtures, and practical open-source tooling.
I currently have related applications under evaluation (including LTFF, Open Philanthropy/Coefficient-affiliated pathways, and NLNet), but no confirmed grant payout yet.
This Manifund grant would provide the bridge needed to convert promising prototypes into public, reusable safety infrastructure.
I am requesting $20,000 to deliver, within 90 days, three public goods: LTP-Bench v0.1, an open LTP+CML oversight library, and a reproducible comparative evaluation against behavioral-only baselines.
At the $10,000 minimum, I commit to a meaningful first release: one framework integration, an initial labeled adversarial corpus, a working prototype library, and an initial public report.
Update: I currently have related applications under evaluation (LTFF, Open Philanthropy/Coefficient-affiliated pathways, NLNet) and am clarifying fiscal sponsorship for an external SFF application. Immediate execution focus is shipping LTP-Bench v0.1, one production-grade LTP+CML integration, and a reproducible evaluation package within the 90-day plan.