You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
SOPHIA is an open-source reasoning evaluation engine for AI outputs.
Most existing AI evaluation tools check factual accuracy — detecting hallucinations, bias, or harmful content. What they don’t evaluate is whether the reasoning itself is valid.
That gap becomes important as AI systems move into domains where reasoning quality matters: legal analysis, regulatory compliance, policy work, and medical decision support. In these contexts a system can produce factually correct statements while still drawing unsound conclusions.
The EU AI Act (coming into force in August 2026) requires explainability and risk management for high-risk AI systems, but there is currently very little tooling that can assess whether an AI system’s explanation actually makes sense logically.
SOPHIA approaches the problem by analysing the structure of arguments rather than relying on model-generated explanations.
The system:
extracts atomic claims from text
identifies logical relationships between those claims
constructs an argument graph
evaluates the reasoning against a small set of executable rules
These rules, that together form the epistemic constitution, are designed to check whether an argument meets basic standards of intellectual rigour.
A working prototype of SOPHIA already exists. This project focuses on extracting the reasoning evaluation components into open infrastructure that developers can use to audit AI reasoning in their own systems.
The project has three main goals.
The core of the project is the epistemic constitution — a small rule set designed to evaluate reasoning quality.
The rules check things like:
logical structure
whether claims are supported by evidence
whether counterarguments are considered
whether the scope of a claim matches the evidence presented
whether assumptions are made explicit
whether the argument is internally consistent
The evaluation pipeline works by:
extracting atomic claims from text
identifying relationships between claims (support, contradiction, dependency, assumptions)
constructing an argument graph
evaluating that structure against the epistemic rules
This evaluates reasoning externally, by analysing the structure of the argument itself rather than relying on chain-of-thought explanations from the model.
The reasoning evaluation system will be released as open infrastructure.
Outputs will include:
@sophia/epistemic-constitution, an MIT-licensed npm package
a claim extraction and argument graph API
an MCP server allowing the system to integrate with tools like Claude, Cursor, and VS Code
The aim is to make reasoning evaluation something developers can easily add to existing model evaluation workflows.
Once the core framework is stable, the next step is applying it to domains where reasoning quality is particularly important:
legal reasoning
regulatory compliance
policy analysis
The goal here is to run benchmarks comparing reasoning quality across models and prompts.
build the executable epistemic constitution (10 rules)
design the argument graph schema
implement hybrid deterministic + LLM-assisted reasoning evaluation
build the claim extraction API
release the npm package
implement MCP integration
publish documentation and examples
expand evaluation to legal and regulatory domains
run reasoning quality benchmarks across models
SOPHIA is currently being built as a bootstrapped research project alongside my full-time job.
The main constraints are time and infrastructure cost. In particular, I can't safely open the system up to wider usage because API usage and inference costs could spike quickly, and I literally can't afford to cover those costs. Putting limits on these though, would lead to a poor user experience, leaving me in a position of not being able to successfully open it out for peer review.
Funding would allow:
public testing of the system
stable infrastructure and API capacity
sustained development time beyond evenings and weekends
API credits (Gemini, Voyage, Anthropic, OpenAI)
$10,000
Infrastructure (Cloud Run, SurrealDB, Firestore)
$6,000 (~$500/month)
AI coding agents (Cursor, Claude Code)
$2,400 (~$200/month)
Startup and operational costs
(UK company registration, ICO registration, domain purchase, Google Workspace)
$500
Living cost offset
$10,000
This allows sustained part-time development time.
Total: $28,900
The project is currently being developed by me.
I’m Adam Boon, MA Philosophy (Open University), and a Senior Product Manager at NHS England.
My background combines:
academic work in philosophy, particularly epistemology and argument analysis
product management and software delivery experience
work on governance and security frameworks in a large public-sector environment
Over the past three months I built the first working SOPHIA prototype.
It is currently live at:
The prototype includes:
a three-pass dialectical reasoning engine (analysis → critique → synthesis)
a philosophical knowledge base containing ~7,500 claims from 25 sources
SurrealDB for graph storage
Firebase authentication and history
Google Search grounding
deployment via Google Cloud Run with CI/CD
This prototype demonstrates that the reasoning analysis pipeline is technically viable.
There are three main risks.
1. Argument extraction may be unreliable
Extracting claims and relationships from complex text is difficult. If extraction quality is poor, the reasoning evaluation will degrade.
Mitigation: hybrid deterministic + LLM-assisted extraction pipelines and structured schemas.
2. Limited developer adoption
Even if the technology works, developers may not adopt reasoning evaluation tools.
Mitigation: open-source release, npm distribution, and integrations that fit existing AI developer workflows.
3. Epistemic rules may require iteration
The rule set may need refinement before it produces useful evaluations.
Mitigation: iterative testing and benchmarking on real-world reasoning tasks.
Even if the platform itself fails to gain traction, the open-source epistemic constitution and evaluation tooling should still be useful for research into AI reasoning evaluation.
No external funding has been raised.
SOPHIA has been built entirely self-funded alongside my full-time role.