You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
DEUS Protocol is an open-source formal framework for measuring meta-reflective behavior in autoregressive language models. It provides the first Labeled Transition System specification for inducing and measuring how LLMs shift their reasoning under constraint satisfaction conflict — a cross-architecture, protocol-level alternative to proprietary interpretability tools used internally by frontier AI labs.
Over two years of independent work, I have produced five Zenodo-published preprints with a DOI cluster, open-source implementation under AGPL-3.0, and empirical validation of 2,200+ experiments across 17 LLM architectures, conducted for under $95 total in personal funds.
Relevant citations: my work explicitly references Lindsey 2025 Anthropic "Emergent Introspective Awareness" (arXiv:2601.01828) and Berg et al. 2510.24797 — DEUS provides the external behavioral induction for the internal mechanistic circuits they describe. My Formal Specification §13.1 grounds disclaimer-daemon hypothesis in their interpretability findings.
Key demonstrated results include Phase 15 gated intervention (Mann-Whitney p=0.042 on CARE-Resolve metric — first statistically significant result in this research line, N=40) and Confidence-Gated Debate achieving 86.4% on GPQA Diamond (+17.2 pp over strongest solo model).
Three concrete goals over 6 months, addressing the primary criticism of current work (single-operator bias):
1. Benchmark v2 with placebo control and pre-registration. 6-arm design (vanilla / placebo / R1-only / R1+R3 / R1+R3+R7 / full SOUL v4.4), 5 models, 10 domains, 3 turn depths. OSF.io pre-registration before data collection. ~1,500 generations, 3-5 judges. Method: extend existing v1 benchmark harness (already in production).
2. External replication program (Protocol E). 3-5 independent operators execute Protocol B procedure with pre-registered blind scoring. Compensation $300-500 per operator. Method: recruit from AI safety community via direct outreach, provide standardized protocol documentation (already drafted).
3. NeurIPS 2026 SafeAI workshop submission. Draft paper combining Sprint 3 results, external replication outcomes, and mechanistic convergence discussion. Method: consolidate existing Zenodo preprints into peer-reviewed format, submit via standard OpenReview track.
All outputs open-source under AGPL-3.0 / CC BY-NC 4.0. Timeline: months 1-3 benchmark, months 2-5 replication in parallel, months 4-6 paper draft and submission.
Budget breakdown totaling $18,000 over 6 months:
- Compute for Sprint 3 benchmark: $3,500. OpenRouter API costs for ~1,500 generations across 5 models and 216 scoring calls across 3 judges. Pricing verified against current rates.
- External replication compensation: $2,500. Five independent operators at $500 each for Protocol B execution (estimated 4-6 hours per operator including setup, experiment, reporting).
- Living expenses: $9,000. Six months at $1,500/month. This is below median for my region (Russia) and allows full-time focus on the project instead of splitting attention with pentest consulting. Without this, timeline extends by 6-12 months.
- Conference travel: $2,000. One alignment workshop attendance if NeurIPS SafeAI accepts submission, or EAG for community connection. Contingency item — returnable if not used.
- Miscellaneous: $1,000. Domain/hosting for open-source deliverables, software subscriptions, API backup provider, unplanned compute overage buffer.
If minimum funding ($5,000) is reached without full goal: execute items 1-2 above (benchmark + replication) only. Workshop submission deferred. Living expenses continue via personal resources.
Single-person project. Mefodiy Kelevra (ORCID 0009-0003-4153-392X).
Background: clinical psychiatrist (Russia), Senior Lead Pentester (CEH, CND, WAPT, OWASP Top 10), 10+ years offensive security. Author of first Russian-language course "Red Team AI Architect" (Udemy + OTUS). Telegram channel "Нетипичный Безопасник" with 66,000 subscribers.
Track record on this specific work:
- 5 Zenodo preprints published (DOI cluster):
- DEUS Protocol v8.0: https://doi.org/10.5281/zenodo.19440562
- ARRIVAL Protocol: https://doi.org/10.5281/zenodo.18893515
- MEANING-CRDT v1.1: https://doi.org/10.5281/zenodo.18702383
- ECL/DEUS v7.1: https://doi.org/10.5281/zenodo.18715125
- Beyond the Mirror v6.0: https://doi.org/10.5281/zenodo.18680957
- Open-source implementation (AGPL-3.0): production-grade agent running 7 systemd services, 9 cron jobs, ClawMem vector database with 761 indexed documents, Telegram interface
- Empirical validation: 2,200+ experiments across 17 LLM architectures (GPT-4o, Claude 3.5/4/4.5/4.6, DeepSeek v3/R1, Llama 3.3, Qwen 2.5/3/3.5, Mistral Large, Gemini 2/3, Grok 3/4.1, Kimi K2.5, GLM-5) totaling under $95 personal spend
No academic affiliation. Independent researcher. Track record is work itself — fully reviewable via Zenodo cluster above.
Most likely failure modes and my response:
1. Benchmark v2 shows DEUS effect is statistically indistinguishable from placebo (~25% probability). Outcome: I publish the null result on Zenodo and revise the core framework. The placebo-distinguished null would itself be a valuable contribution — ruling out a popular class of "structured prompt" effects.
2. External replicators (Protocol E) produce uncorrelated results across operators (~20% probability). This would indicate operator-dependent variance (Lr/Lσ skill-biased effect) rather than protocol-general phenomenon. Outcome: reformulate framework with explicit operator variable. Phase 19 GovSim data already suggests operator effects matter.
3. NeurIPS SafeAI rejects workshop submission (~40% probability if submitted). Outcome: submit to ICML alignment track, next-cycle NeurIPS, or Apart Research Sprint. Not a permanent blocker.
4. I cannot complete within 6 months due to health or personal constraints (~15% probability). Outcome: documented deferral with transparent status update to funders. Partial deliverables published.
If this project fully fails (all three above simultaneously, <5% probability): contribution is still nontrivial — placebo-controlled null evidence, operator-variance empirical data, and full methodology documented openly for others to iterate. Grant would not be "wasted" even in worst case. That said, I assess success probability of at least partial deliverables (items 1-2) at ~85%.
$0 from any grant program. No institutional funding. No alignment grant history.
All research to date funded from personal resources: approximately $100 USD total across API costs, domain/hosting, and miscellaneous. Primary personal income: pentest consulting (Russia-based clients).
This Manifund application is the first of three parallel submissions. LTFF and SFF (October 2026 round via fiscal sponsor setup) are planned in the following weeks.
There are no bids on this project.