Brier.fyi is live! 🎉
Calibration City was great, but we needed new data and a new way to convey our insights. Now we have Brier.fyi - a more intuitive, guided exploration of prediction market accuracy.
Go check it out first, and then come back here for the progress update!
Calibration City 🌇
When we started this proposal, we already had a good MVP. We had a data pipeline to ingest data from the public APIs of Kalshi, Manifold, Metaculus, and Polymarket, but it was brittle and missed some data. We were able to do some cool things with that data, and we learned a lot. Most importantly, we showed that prediction markets are pretty well calibrated! Over time I added some features like customizing the bin method, weighting the averages, advanced filtering, and a simple Brier score analysis. However, I was always reluctant to actually show the overall Brier score of any particular platform or category since they are so fundamentally different it would be a bit misleading. Instead, I fell back to showing general trends and comparisons.
Matching 🔥
I really wanted to be able to answer the questions “How accurate are prediction markets in general?” and “Which market platform is most accurate?”
In order to do that, I needed to be able to compare apples to apples. My proposal for this project was to group identical markets from each platform, then grade the markets in those groups against each other. And that’s what we did! Right now we have 931 linked markets and I’m adding more every day. With these scores we can finally answer who’s the most accurate!
The results:
On average, prediction markets are pretty accurate! One month before close, 62% of markets were already within 30% of the correct resolution, representing a Brier score of 0.09.
No prediction market platform is a clear winner on all topics! Kalshi technically leads this score, but by only a few percent. However, looking at each question category shows that most platforms have a few niches where they shine - Kalshi and Polymarket are good at sports, while Metaculus is best at scientific topics. See all of the scores on the platforms page.
New Features 💡
In addition to the market matching, I had some improvements I already wanted to make to the site:
We now have metrics for the market volume and an estimate for the number of traders on Polymarket.
We decompose multiple-choice markets into binary markets, now allowing us to score almost all markets on each platform!
Additionally, I was noticing some issues with Calibration City I wanted to address:
The previous data extract-transform-load data pipeline took a long time to run and failed often, which lead to me not refreshing the database for months at a time. Now the entire thing is automated and much more resilient, allowing us to gather new markets every single day.
Over time the site became slower and slower, plus it doesn’t always load correctly on the first visit. Caching doesn’t work quite right and so there was always a lot of load on the server. The new site is completely static, cached properly, and loads instantly with all data. It’s also much easier to develop and get the data for their own experiments.
The old site was often cited as proof that markets are accurate without explanation or context, leaving visitors confused unless the person who linked it also gave an explanation. The primary calibration chart looks neat, but doesn’t really mean anything unless you already know about calibration. The Introduction page was supposed to be a remedy for that, but basically nobody has read it. In response, every chart and visualization on the new site has some sort of explanation of what the chart means, and most also have the results and context presented in a way new users can understand. We also address the primary question most users have - “How accurate are prediction markets?” - right at the very beginning.
There was a split between users trying to prove that prediction markets in general are great versus those trying to prove that their favorite site is the best. The old site had enough data that you could try and prove either one, but it wasn’t built with that in mind. In the new site I held those viewpoints front and center and tried to answer both honestly and directly.
Wrap-Up 🏁
I still have a lot of work to do here, but I’m closing this project because I think I’ve completed my main goal. My main focus is getting user feedback, making things crystal clear, and matching more markets together. My roadmap is alongside the project code on GitHub, and both are open to community contribution.
To all of my donors: thank you for your contributions and kind words. Without your encouragement Brier.fyi would not have happened. Feel free to get in touch with me anytime. I’ll be at Manifest next weekend if anyone wants to say hello!