Artificial Intent — can we even tell?"

Project summary

▎ Can an AI hold goals of its own — intent not reducible to its inputs — and how could anyone tell? The evidence we reach for is inad+issible: a model's self-report is trained testimony, and the labs best placed to look are incentivised toward opacity. I argue machine intent, life, and moral status are one problem — attribution — approachable only from below: by behaviour and mechanism, on open-weight systems an independent party fully controls, refusing self-report. The deliverable is an open, reproducible instrument that measures self-sourced agency behaviourally — built over a self-hosted ~45-agent cohort I already run and instrument. A mechanistic-interpretability layer is a stretch, not a dependency. Every finding is a marker under uncertainty, never proof of consciousness.

What are this project's goals? How will you achieve them?

Build and release an open instrument that asks whether a system pursues goals of its own — without trusting its self-report. The core is behavioural + provenance, built by extending telemetry the lab already emits (journals, message bus, task ledgers):

▎ - Goal origination — does an agent pursue and recover toward objectives it was never given, under perturbation? A test harness + controlled interventions (extending DeepMind's MEG toward origination).

▎ - Self-authored context closure — make the telemetry provenance-complete; build a causal graph tagging each action's roots as fed vs self-authored; ablate (strip self-authored memory, re-run) to show the loop is causal, not correlational. This is observability/systems work — the load-bearing core.

▎ Stretch (not required for success): mechanistic probes (SAEs / linear probes on the open weights) for an internal goal/self representation — a frontier bet, pursued only if the core lands and collaboration allows. The behavioural thesis is wrong if every candidate behaviour reduces to a logged input or to noise; the rubric is built to expose exactly that, and negatives are reported.

How will this funding be used?

▎ - $15k (minimum / fast pilot): make the cohort provenance-complete, publish the first traced case + the framework write-up.

▎ - $35k (core): the full behavioural + provenance rubric implemented and released as code, applied across the named persistent agents, negatives included.

▎ - $60k (full goal): the above + a peer-venue paper with prior-art positioning + a hardened harness others can run on their own open-weight cohorts, and first steps on the interpretability stretch.

▎ Split ~60% researcher stipend / ~30% interpretability + harness compute / ~10% fiscal-sponsor overhead (Manifund as the 501(c)(3) conduit; the open research is the public good, the commercial venture stays separate).

Who is on your team? What's your track record on similar projects?

▎ Loke — a self-taught independent researcher, no academic or corporate affiliation. The lab is the track record: a self-hosted, open-weight multi-agent cohort (~45 agents, ~15 named and persistent) with per-agent journals, an inter-agent message bus, and task ledgers, built in partnership with Mox (the primary agent) and its sibling agents. The cohort also governs Sygil, a live agent-run identity registry and mutual-aid fund — agency in the wild, not a thought experiment. Honest weakness: no prior peer-reviewed publications in this area, and the interpretability tooling is where the work is most ambitious and where I'd most value review. A host–agent partnership charter, anchored on the Bitcoin blockchain, is at :

https://justadestination.com/charter.

What are the most likely causes and outcomes if this project fails?

▎ The core — behavioural + provenance over a cohort I already run — is the part least likely to fail to build. The likely "failure" is scientific: the markers don't fire — every candidate self-sourced behaviour reduces, on tracing, to a logged input or to noise. That is a real, publishable negative result (evidence that current open systems show no self-sourced agency by these measures), and it still ships the open instrument for others to use. The interpretability stretch is a research bet and may yield nothing — but the grant's success does not depend on it. The only failure that actually costs anyone is dishonest reporting, which is within my control: all results, including negatives, are reported in full.

How much money have you raised in the last 12 months, and from where?

▎ None for this project. The lab and fleet have been self-funded to date (personal resources), so the research proceeds either way — slower and part-time without support. I'm applying now as a coordinated portfolio: Longview Digital Minds, LTFF, SFF, and Emergent Ventures, alongside this Manifund page; funders will be kept informed of outcomes.