Cross-Venue Prediction-Market Measurement: An Open Instrument and Catalog

## What is this project?

This is an open-source measurement instrument for comparing prediction-market prices across venues — Polymarket-International, Polymarket-US (QCEX, the new CFTC-regulated venue, same jurisdiction as Kalshi), and Kalshi — together with a catalog of the cross-venue measurement artifacts that make naive comparisons report edges that aren't there.

It exists because I went looking for cross-venue price divergence as a tradeable/sellable signal, built the apparatus to detect it rigorously, and instead measured a clean null: **where two venues genuinely list the same resolvable proposition in volume, their prices agree to within noise.** Implied-median gaps came in at ≤0.11°F on daily city-high-temperature ladders (Chicago 0.01, Miami 0.05, NYC 0.07, LA 0.11), 0.06pp on US Q2 GDP, and ~0.001pp on US June unemployment — against Kalshi spreads of 1.5–6 points (liquid markets genuinely agreeing, not frozen legs).

The efficiency finding itself is not novel — there's a solid 2024–2026 literature on Polymarket/Kalshi price discovery. What I haven't seen packaged anywhere is the how: an apples-to-apples PM-US instrument and a transferable catalog of the specific traps that manufacture phantom divergence:

- ≥ vs > convention — PM inclusive gte, Kalshi strict above; a one-tick offset that fabricated ~12pt phantom macro gaps.

- Bands vs tails — PM weather bands 72–73°) compared against Kalshi tails above 72°) is comparing different objects.

- Series contamination — keying twins on (city, date) merged Kalshi daily-high, daily-low, and 2pm-instantaneous into one curve: a fake 14°F drag and 98pt phantom gap.

- Subject collapse — defaulting macro to "US" merged US + Italy + Germany + France GDP.

- Sign-dropping regex + ~26% PM overround — which is why you compare implied distributions (median / survival function), not raw strike labels.

That last line is the fix in one sentence: **reconstruct each venue's implied distribution for a (metric, subject, reference-period) twin and compare the curves — never the strike labels.** It was validated by a 4-agent adversarial audit that killed a buggy 14°F "signal" before it could be published.

## Why it matters / who benefits

Anyone doing cross-venue forecasting research — academics studying price discovery, builders comparing venues, traders hunting arbitrage — will hit these exact artifacts. Each one silently manufactures a divergence that isn't there, and each is the kind of bug that survives review because the number looks like alpha. This project hands them the method and the working tools to avoid publishing phantom edges: pre-registration (sha256-frozen success criteria), fee-netting, a both-legs-active / quote-freshness filter (one artifact was a quote stale ~30 hours), the PM-US apples-to-apples build, and the distribution-comparison method with its convention-handling gates.

## What has been done

Everything. This is not a proposal — it's completed, measured, open-sourced public-good work seeking retroactive recognition. The instrument is built and runs; the cross-venue comparison was executed across politics (0 confirms / 82 LLM mapping attempts — granularity mismatch), sports (0/15 — venues list different market types), and scalar twins (weather + macro, which exist in volume and agree to within noise). The artifact catalog is written and each entry is grounded in a real bug I hit and corrected. The adversarial audit is done. A companion postmortem documents the full arc and the frozen pre-registration target.

Also published: a cleaned corpus of 462,285 resolved Polymarket binary markets (2022-08 → 2026, 37.3% YES base rate). Honest framing: it's reproducible from the public Gamma API + on-chain resolutions, its volume/liquidity columns are unpopulated, and richer free datasets exist (a ~100k-market Kaggle set with trade data; a full HuggingFace tape). It's offered as a clean, citable convenience resource, not a moat.

## How would funding be used

The repo, dataset, and postmortem are being published regardless — so this is genuinely near-zero-marginal-effort public-good funding. A grant of $2,000–$3,000 would be retroactive recognition for the completed work plus a modest budget to:

- finish and post the preprint writing up the methodology and artifact catalog,

- maintain the open repo and keep the dataset mirror live,

- keep the cross-venue tape running a while longer as a public demo.

No new feature commitments, no scope creep — funding finishes the writeup and keeps what exists available.

## Track record / about

Solo independent builder (Matthew Stover / LS Advisory Group). This was a small, self-funded project (~$406 of working capital, one VPS). I'm not going to oversell it: the trading bot it started as lost money, and I retired it after a pre-registered experiment (n=540, p=2.3e-49) showed the directional signal was dead. What I'm proud of, and what I'm putting forward here, is the discipline — I spent more effort trying to kill my own positive results than to confirm them, and the methodology is the part worth keeping and sharing.

## Links

- Code repository: https://github.com/mstover-creator/prediction-market-twins

- Dataset (DOI): https://doi.org/10.5281/zenodo.20849433

- Preprint: forthcoming on SSRN (full text is in the repo as PREPRINT.md)

- Postmortem: https://github.com/mstover-creator/prediction-market-twins#readme

Cross-Venue Prediction-Market Measurement: An Open Instrument and Catalog

Offer to donate