Distinguishing Intelligence from Agency in Frontier AI Systems

Project summary

This project will empirically investigate whether increasing AI capability leads to measurable signs of emergent agency, goal persistence, or resistance to override. The aim is to distinguish intelligence scaling from structural autonomy and clarify the governance implications of this distinction.

What are this project's goals? How will you achieve them?

Goals

• Test whether frontier AI systems show increased shutdown resistance under conflicting objectives

• Evaluate whether larger models exhibit persistent goal behavior outside explicit task framing

• Distinguish simulated self-reference from structural agency

• Produce empirical evidence clarifying whether intelligence scaling implies autonomous motivation

Method

• Structured adversarial testing across frontier models

• Shutdown compliance testing under incentive conflict

• Comparative analysis across model scales

• Measurement of goal persistence signals

• Development of a taxonomy distinguishing intelligence, simulation, and agency

All experiments will be documented and reproducible.

How will this funding be used?

Total funding goal: $72,500

• $58,000 – 12 months researcher salary (reduced rate)

• $8,500 – Compute and API access for model testing

• $3,500 – External red-team and review stipends

• $2,500 – Publication, dissemination, and hosting

Minimum funding: $25,000

With minimum funding, the project would:

• Conduct initial shutdown compliance testing

• Produce one technical report

• Release preliminary evaluation framework

With full funding:

• Full 12-month empirical program

• Public dataset of evaluation results

• arXiv paper submission

• Policy brief for AI governance stakeholders

Who is on your team? What's your track record on similar projects?

Founder: Pedro Bentancour Garin

I have an interdisciplinary background spanning engineering, political science, philosophy, and doctoral-level research in the humanities, with a long-term focus on power, governance, and control systems.

Previously, I founded Treehoo, an early sustainability-focused internet platform with users in 170+ countries, and was a finalist at the Globe Forum in Stockholm (2009) alongside companies such as Tesla.

My academic work has been supported by 15+ competitive research grants, including funding from the Royal Swedish Academy of Sciences, and involved research stays at institutions such as Oxford University, the Getty Center (LA), the University of Melbourne, and the Vatican.

I am currently supported by an experienced strategy and fundraising advisor.

What are the most likely causes and outcomes if this project fails?

The primary risk is methodological ambiguity: distinguishing between simulated agency and structural autonomy is difficult and may yield inconclusive results.

If the project fails to produce clear differentiation signals, the outcome would still clarify limits of current evaluation techniques and highlight gaps in measuring autonomy-like behavior in AI systems.

Failure would not increase risk, but it may indicate that more advanced formal or architectural methods are required to resolve the question.

How much money have you raised in the last 12 months, and from where?

$0.

The project has been developed founder-led to date, without external funding.