You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
BMC is a deterministic, fully-inspectable simulation of memetic-affective agents with no engineered scalar reward function — the agent's objective is an inspectable homeostatic set-point (restore internal affective balance), where the gradient is zero at the set-point and reverses past it. Because no engineered channel exists whose maximization yields an ever-better outcome, the canonical wireheading target is structurally absent rather than penalized post-hoc (an emergent gauge-level route is disclosed as a caveat below, not a contradiction of this). This project delivers a technical note formalizing why, plus two pre-registered stress tests of the two open failure modes I already disclose as honest limits.
Goals: (1) write a self-contained technical note formalizing the three-ingredient reward-hacking analysis — an external proxy maximand, unbounded optimization pressure, and a proxy detachable from its referent — and show which of the three this architecture structurally lacks; (2) run two pre-registered stress tests targeting the two open failure modes disclosed as honest limits below: gauge-level interoceptive wireheading, and Cultural-Memory parasite-meme contamination; (3) scope (not yet execute) the next step that would convert this from an architectural claim into a behavioral one — a decoupled signal channel with closed discriminative feedback, letting a capable agent actually attempt to game a proxy and fail or succeed.
How I'll achieve them: the engine is deterministic and every internal variable is inspectable, so this is a mechanistic exercise, not a training run — I read the architecture off directly, then pre-register and run the two stress tests the same way I've run prior experiments in this program (pre-registration before compute, independent replication before banking a result).
The $20,000 buys 6 months of my time to: write the technical note formalizing the three-ingredient analysis with the symbol-grounding assay as its falsifiable core; run and analyze the two pre-registered stress tests; and publish all of it, including null results reported as such if the stress tests find the architecture's disclosed failure modes ARE exploitable. No engine internals are disclosed — only results, methods, and published papers. No compute cluster is needed; this runs on my existing local machine.
Solo — no team, no organization. 5 published papers with DOIs on Zenodo (bmc-theory.org/publications): the theoretical framework; a working-memory-capacity derivation; a communication-pressure/memetic-replication study; an emergent-language paper (compositional signaling arising from an empty starting state, no communication-objective training); and an integration paper extending the same empty-start setup to show cultural convergence and functional meta-cognition, again without reward signals or pretraining. Multiple headline findings independently blind-replicated before I bank or publish a result. My published papers already report negative findings directly where that's what was found (communication is survival-neutral; reception without expression is indistinguishable from isolation, p=0.64), and I maintain dated internal pre-registrations, several of which have resolved as banked nulls. Engine: a deterministic, from-scratch Rust simulation (300+ tests, 103 automated gate checks), built and maintained solo since 2026 without institutional support.
Most likely failure mode: time — this is solo, unpaid-to-date work alongside other obligations, so the technical note or one of the two stress tests could slip past 6 months or not reach publication quality. If that happens, I'd publish whatever is complete (the note, or one stress test) rather than withhold a partial result. A separate, non-failure outcome worth naming explicitly: the stress tests could find that the disclosed failure modes (gauge-level interoceptive wireheading, Cultural-Memory parasite-meme contamination) ARE exploitable under pressure — that would be a genuine negative finding about this architecture's safety properties, not a failure of the project, and I would report it as such, the same way I've reported prior null results in this program.
$0 raised in the last 12 months. I have a parallel application pending with the Long-Term Future Fund (submitted 2026-07-01, ask $44,000) for the same underlying architecture and results, but scoped and framed differently — a broader 9-month research stipend covering my full living costs while I do this work, versus this $20,000 ask which funds narrowly the 6-month technical note + stress tests themselves. These are alternative asks, not stacked: if both were funded, the budgets would be reconciled to the actual work, not paid twice for the same deliverables. No other funding applications in the last 12 months.
There are no bids on this project.