You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
LLM agents increasingly choose which tool to call from a large registry, and which-tool-did-it-pick is a precondition for safe deployment — incident analysis, capability auditing, and red-teaming all need a routing signal humans can actually inspect. Today that signal is usually a cosine similarity score inside a 1,536-dimensional embedding space.
I’ve built and shipped Meridian, an open-source MCP that replaces opaque embedding routing with a deterministic orbital classifier: every candidate receives a physics signature, a celestial class, and a one-line decision rule explaining why it ranked where it did.
Live now:
MCP endpoint: mcp.ask-meridian.uk
GitHub: Meridian MCP Repository
Meridian v2.1.0 already ships with:
OAuth 2.1 + PKCE
stdio + Streamable-HTTP transports
deterministic orbital routing
47 unit tests
npm package + GHCR image
Cloudflare Worker deployment
GitHub Actions CD pipeline
This $5,000 grant funds a public benchmark for tool-routing failures:
a labelled routing dataset
a two-judge evaluation matrix
an open-source eval harness
and a reproducible write-up
The goal is to make tool-routing quality measurable in the same way perplexity standardized language-model evaluation.
Make tool-routing failure rates a measurable, comparable, and citable metric.
1. Labelled task→skill dataset (~500 pairs)
Tasks spanning coding, research, operations, and creative domains. Each task includes:
one correct skill
four distractors
paid human labelling via Prolific or Surge AI
Published publicly on HuggingFace.
2. Two-judge evaluation matrix
Routing judged by:
Anthropic Sonnet 4.6
xAI Grok-4
Plus:
self-hosted BGE-large-en embedding baseline running on Modal
This creates a reproducible comparison between:
frontier-model routing
embedding-based routing
deterministic routing
3. Open-source evaluation harness (CLI)
One-command evaluation against:
Meridian
LangChain routers
LlamaIndex routers
vanilla embedding systems
MCP-compatible routing systems
4. Public write-up
A LessWrong / Alignment Forum post with:
reproducible code
benchmark methodology
routing analysis
published dataset
Dataset downloadable on HuggingFace by month 3
Harness runnable externally with one command by month 4
At least one external framework adopts or cites the eval within 6 months
Cross-classifier results produce publishable insight regardless of outcome:
If deterministic routing wins → interpretable routing is viable
If it loses → interpretable routing carries measurable capability cost
Anthropic + xAI API credits (judges + baselines) — $1,500
Human labelling (~500 pairs) — $1,400
Starlink hardware + 4 months service — $950
Cloudflare Workers Paid + GitHub Models inference — $400
GPU compute (Modal / Lambda) — $300
Buffer (~10%) — $450
No salary or stipend is included. Development work is performed independently alongside contracting income.
Reduced scope:
single judge (Sonnet 4.6 only)
~250 labelled pairs
no Starlink reliability layer
Still useful, but less reproducible and less citable.
Independent solo engineer.
Meridian MCP
GitHub Repository
Lens — WebXR Vision Lab pairing SmolVLM + Meridian routing
Lens Repository
lens.ask-meridian.uk
Photon — photonic retrieval router using the Meridian backend
Photon Repository
Writing & architecture notes
ask-meridian.uk/blog
Published work includes:
classifier walkthroughs
deterministic routing analysis
OAuth operator-pays architecture
Cloudflare Workers vs GitHub Pages deployment trade-offs
Embedding or LLM-based routing may outperform Meridian.
Outcome:
still produces a useful public benchmark
still yields a publishable result
clarifies whether interpretability costs capability
Routing datasets are difficult to label reliably.
Mitigation:
second-pass review
manual validation sampling
reduced dataset size if noise exceeds threshold
Possible outcome:
benchmark remains useful as a public reference
still functions as an internal regression metric for Meridian
Lower funding reduces:
dataset size
judge diversity
reproducibility
The project still ships regardless of funding outcome.
$0 external funding raised.
All work to date has been self-funded through contracting income, with approximately:
£1,500 (~$1,900 USD)
spent across 2025–2026 on:
Cloudflare Workers Paid
GitHub Models inference overage
domains/DNS
monitoring
infrastructure
There are no bids on this project.