Ndome — zero-trust, mechanically-verifiable safety scoring for AI agents

Project summary

Ndome is a working prototype of an independent safety scorer for AI agents. It produces a deterministic, reproducible, auditable safety scorecard — and it never needs access to the agent owner's private data. Today it spans a 7-layer engine (~25,000 lines), a library of 56 graded adversarial vectors, and a live boundary test where 28 attacks were run and 0 succeeded; every score carries a C1–C5 certainty grade and a traceable evidence trail.

The most telling result so far is a failure I caught on myself: on an early blind run — the harness couldn't see the system's internals — it found a real boundary break that my in-house tests had missed. I credited no score, fixed it, and re-verified under blind testing before anything moved. This grant funds turning that prototype into an open, documented, reproducible evaluation tool plus a public methodology write-up. I'm not asking to be believed — I'm asking for time to make the method open and independently checkable.

What are this project's goals? How will you achieve them?

The infrastructure already exists; this grant opens it, it doesn't build it from scratch. The goals: (1) release an open reference implementation of the deterministic scoring engine and the C1–C5 certainty grading, runnable by others on their own infrastructure; (2) publish a documented methodology — the threat model, the scoring criteria, and why mechanical, zero-trust scoring complements ML-based evals; (3) ship a reproducibility harness with fixtures that prove "same evidence → same score"; (4) establish honest, certainty-graded, no-laundering scoring as a usable pattern for third-party or regulatory verification without exposing private data. Plan: Month 1 — methodology write-up and threat model published; Month 2 — open reference implementation released; Month 3 — reproducibility harness, fixtures, and a short demo. All of it runs black-box / zero-trust, air-gapped, with no access to private data.

How will this funding be used?

Part-time engineering to package the open reference implementation — $14,000 (56%). Compute to build and validate the reproducibility harness — $5,000 (20%). Methodology write-up and documentation — $4,000 (16%). Misc (domain, hosting, tooling) — $2,000 (8%). Total — $25,000. This buys focused engineering and documentation time to open-source what already works, not initial R&D.

Who is on your team? What's your track record on similar projects?

Ryan - solo founder, Edmonton, Alberta, Canada. I built Ndome end-to-end with no institutional affiliation and no external funding: a 7-layer security/QA engine (~25,000 lines), nightly automated regression and integrity testing, and 56 graded adversarial vectors mapped to recognised frameworks (OWASP LLM Top-10, MITRE ATT&CK / ATLAS, STRIDE, SLSA, SOC 2). The same discipline runs through the whole system: explicit C1–C5 certainty grading on every claim, deterministic and reproducible outputs, strict separation of verified fact from inference, and a hard no-laundering rule that keeps sandbox results from inflating the real score.

What are the most likely causes and outcomes if this project fails?

The most likely cause is solo-founder bandwidth — packaging, documentation, and the reproducibility harness taking longer than three months at part-time capacity. Mitigation: this grant funds the part-time engineering that closes exactly that gap, and every deliverable is open-spec and open-source, so the methodology survives the individual. A second risk is that mechanical, zero-trust scoring is seen as only a complement to ML-based evals rather than a replacement — which is true, and I'm explicit about it; it still adds an independent, reproducible, privacy-preserving check that today's evaluators don't provide. If the project fails outright, the published methodology and any released code remain citable and usable by others.

How much money have you raised in the last 12 months, and from where?

$0. The project has been entirely self-funded to date.