Guardian-Agent: Python port, conformance suite, red-team scorecard, SPEC v1.0

Project summary

guardian-agent is a runtime supervisor for tool-using LLM agents. It wraps any agent's tool-call loop with a deterministic enforcement layer: hash-chained ed25519-signed audit logs, hard emergency-stop, two-key human-in-the-loop approval gates, capability-class scoping with per-class rate buckets, Yellow-line tripwires on suspicious capability combinations, honeytokens with zero false positives by construction, external chain attestation, and a dead-man's heartbeat.

The TypeScript reference implementation is already public, AGPL-3.0-or-later, at the v0.10 feature milestone, and shipped to npm today (2026-05-15) as @flowdot.ai/guardian-agent@0.1.0 (https://www.npmjs.com/package/@flowdot.ai/guardian-agent). Install with npm install @flowdot.ai/guardian-agent from any Node project. Source at flowdot-llc/guardian-agent-ts (https://github.com/flowdot-llc/guardian-agent-ts), 539 tests at 100% line + branch + function coverage. The repo embeds three demo GIFs showing audit-tamper detection, the HITL gate intercepting a wire_transfer, and a honeytoken catching an exfiltration attempt. Each is reproducible locally via npm run demo:N. The npm package ships three CLIs on PATH: guardian-verify, guardian-baseline, guardian-correlator.

The canonical language-neutral SPEC (v0.5) lives in the sibling repo flowdot-llc/guardian-agent (https://github.com/flowdot-llc/guardian-agent), alongside a Python port currently at v0.1.0 stub.

This grant funds the public-goods extensions that take the work from "shipped in one language" to "fielded cross-language public standard."

What are this project's goals? How will you achieve them?

The problem. Claude, GPT, and Ollama-class agents are moving into production in healthcare, finance, defense, and government pilots. The layer that supervises their tool calls (what's allowed, what's logged, what can be stopped mid-execution) remains ad-hoc, closed-source, and unverified at almost every deployment. Every team builds the same primitives from scratch, badly. Every evaluator has to read source code to know whether the supervisor is real or theater. There is no public reference implementation a regulated-industry deployer or an alignment-evals lab can pick up. There is no shared spec that says what a supervisor must do. There is no comparative red-team eval that measures how well frontier models behave under it. guardian-agent exists to close those three gaps.

Five deliverables this grant funds:

1. Python port to TS parity at flowdot-llc/guardian-agent. Full v0.1 through v0.10 in Python, distributed as pip install guardian-agent. Unblocks adoption in the LangChain, DSPy, LlamaIndex, and MCP-Python ecosystems that don't run Node.

2. Cross-language conformance suite. A shared test corpus exercising both implementations against the same SPEC. Operationalizes the SPEC as a portable protocol; any third-party reimplementation in any language can prove its correctness against it. This is the research content.

3. Finish freeze and publish. TS reference is on npm today; this deliverable carries it to v1.0 with frozen API, Python to PyPI at parity, plus CONTRIBUTING.md and the conformance suite gating downstream contributions.

4. guardian-eval cross-model red-team harness. A frozen test suite driving Claude, GPT, and Ollama agents through a fixed corpus of adversarial prompts. Measures gate-bypass surface, audit-log completeness under load, blast-radius on tool misuse, and honeytoken hit rate under exfiltration-shaped traffic. Output is a public scorecard updated on each frontier model release.

5. SPEC v1.0 and adoption push. Bring SPEC.md from v0.5 to v1.0 with PL and formal-methods review and a guardian-conformance suite anyone can run. Pair with integration outreach to METR, FAR AI, and AISI.

Theory of change. Once this lands at v1.0: METR-class agent-evals orgs adopt guardian-eval directly. FAR AI-class alignment-tooling labs ship the supervisor as their trust substrate. Regulated-industry deployers cite the SPEC in procurement and compliance review. Standards and policy work points at the SPEC as a concrete cross-language definition of what "supervisor layer" means. Each unlocks because the work is open, language-neutral, and accompanied by a conformance suite that lets third parties verify their own implementations.

How will this funding be used?

Six months at North Country, NY cost of living:

- Researcher salary (Elliot, 6 months, full-time): $24,000

- LLM API costs (Claude, GPT, Ollama eval runs): $4,000

- Conferences and academic travel (PL or formal-methods workshop, AI safety venue): $4,000

- Independent technical review (PL or formal-methods consultant for SPEC v1.0): $3,500

- Contingency and minor infrastructure: $4,500

- Total: $40,000

Funding tiers:

- $25K (minimum): Python port complete plus initial conformance suite. Demonstrable cross-language parity by month 4. Drops guardian-eval and SPEC v1.0 formalization to follow-on work.

- $40K (goal): All five deliverables shipped within 6 months. Python port at parity, conformance suite published, both implementations frozen and published to package managers, guardian-eval scorecard live with at least one published cross-model report, SPEC v1.0 with PL or formal-methods review attached.

Recipient framing. Funds flow to me as an individual researcher under Manifund's standard expenditure-responsibility agreement, with use restricted to the public deliverables above. FlowDot LLC is the upstream R&D source for the existing TypeScript runtime but is not the recipient of the grant.

Who is on your team? What's your track record on similar projects?

Solo project. I'm Elliot Mousseau, founder of FlowDot LLC (NY, formed 2025-12-15). I built the v0.10 TypeScript reference solo over the last 12 months while simultaneously delivering an SBIR Phase 2 as lead engineer at Obsidian SG (VERTEX program, defense training simulator). The customer accepted the Phase 2 deliverable; the engagement wrapped because Obsidian couldn't afford to retain me, not because the work failed. Today is my last day at Obsidian. References available from both the Obsidian team and the Air Force customer.

Twenty-plus years of full-stack engineering across web, mobile, VR, distributed systems, and applied ML. Recent ML work: real-time VR gesture recognition and multi-gas chemical sensing models for the VERTEX program; document upload, semantic search, and agent chat on AWS GovCloud Bedrock, a direct deployment into a FedRAMP-class regulated environment.

Track record on similar work:

- VERTEX Phase 2 (Obsidian SG SBIR). Solo lead engineer on a defense training simulator with audit, approval, and kill-switch requirements deployed into FedRAMP-class AWS GovCloud. Customer-accepted. Same problem-class as Guardian-Agent: regulated AI infrastructure with hard, non-negotiable audit and stop semantics.

- FlowDot platform (https://flowdot.ai). 12-month solo build of an AI agent platform: visual workflow editor, recipe system (agent, parallel, loop, gate, branch, invoke, output steps), MCP toolkit framework, knowledge base with RAG, voice and multi-modal agents, CLI, Laravel-backed hub. The Guardian-Agent supervisor is the in-process trust layer running across FlowDot's CLI and MCP server surfaces today.

- 120-tool Unity MCP server. Built and shipped solo. Deep practical experience with MCP-server boundaries and tool dispatch.

- @flowdot.ai/guardian-agent@0.1.0 (https://www.npmjs.com/package/@flowdot.ai/guardian-agent). Published to npm today (2026-05-15). First AGPL package in FlowDot's npm scope. v0.1 through v0.10 features all present, 539 tests, 100% line + branch + function coverage.

What are the most likely causes and outcomes if this project fails?

Three real risks, in order of likelihood:

- Python port reads as "just a translation." If a reviewer pattern-matches the Python work as keystrokes-only with no research content, deliverable 1 looks weaker than it is. Mitigation: lean on the cross-language conformance suite (deliverable 2) as the research content rather than the port itself. The suite operationalizes the SPEC as a portable protocol. That's where the contribution actually lives. The port is the existence-proof.

- The TS reference is currently solo-maintained. If I become a bus factor, downstream adopters take on bus-factor risk too. Mitigation: the freeze + publish step (deliverable 3) is specifically designed to broaden the maintainer pool. CONTRIBUTING.md plus the conformance suite makes it credible for new maintainers to land changes without breaking semantics. Even if grant work stalls, the v0.1.0 npm release plus the public SPEC continue functioning.

- guardian-eval may not produce a "clean win." Frontier models might be roughly equivalent under the supervisor, leaving no headline-grabbing model ranking. Mitigation: the value of the eval is the published methodology and reproducible scorecard, not a particular ranking. Even a "they're all roughly the same" finding is a useful field signal. The methodology becomes a building block for future evals.

Worst-case outcome (no deliverable lands at v1.0 quality): the v0.10 TS reference, the v0.5 SPEC, and @flowdot.ai/guardian-agent@0.1.0 remain public goods. The work the grant funded becomes a partial port plus partial eval. Strictly more public infrastructure than exists today.

How much money have you raised in the last 12 months, and from where?

Zero received to date.

In-flight grant applications (none decided yet):

- Emergent Ventures (Mercatus Center). $50 to $75K grant ask, submitted 2026-05-13.

- Foresight Institute AI for Science and Safety Nodes 2026. $10 to $100K grant ask, pre-pitch sent 2026-05-15. Full Nodes application due 2026-05-31.

Past-12-months professional income came from an Obsidian SG SBIR Phase 2 sub-contract (VERTEX program), which wrapped this week and is not a recurring source.

This Manifund grant is the fastest-moving non-dilutive option on my stack and is intended to bridge the 4 to 12 week window before any federal SBIR Phase I award can reasonably land.

Repos and links:

- npm package (TS reference, shipped today): https://www.npmjs.com/package/@flowdot.ai/guardian-agent

- TS source and demos: https://github.com/flowdot-llc/guardian-agent-ts

- Canonical SPEC and Python port: https://github.com/flowdot-llc/guardian-agent

- FlowDot platform: https://flowdot.ai
- 5-minute FLowDot demo video (Unlisted YouTube): https://youtu.be/1SLHdzfjtKw