Jack
Different forecasting platforms often show substantially different predictions for the same questions. I want to collect and analyze data to compare prediction accuracy between different platforms, and help understand how different methods of eliciting and aggregating forecasts compare - e.g. prediction markets vs prediction polls, and real money vs play money prediction markets. How similar or different are their accuracies? Are there different types of questions where different platforms perform better?
I have identified a few categories of forecasting questions that offer identical or near-identical questions that can be directly compared between multiple forecasting platforms - world events, politics, elections, sports, and the ACX prediction contest all offer opportunities for comparisons between some subsets of platforms. Platforms I expect to look at include Metaculus, Manifold, Polymaket, PredictIt, Good Judgement, and possibly others such as Insight, Betfair, etc if data and time permits. I would also like to compare these to forecasts such as 538.
I want to analyze prediction accuracy scores for these comparisons (see the post mentioned in the next section for an example of this type of analysis), and if possible try to understand where the differences come from, what edge the different platforms may have, and to what extent one platform is pricing in the predictions of another platform. I also want to experiment with methods of aggregating the forecasts of different platforms together to produce a meta-forecast.
Last fall I did an analysis comparing election forecast accuracy on the 2022 US elections: https://firstsigma.substack.com/p/midterm-elections-forecast-comparison-analysis. As I point out there, a single election cycle gives a set of highly correlated questions with very limited information value. This project will extend that to more types of forecasts. Additionally, that previous work only looked at the forecasts the night before the election - this project will examine forecasts over different spans of time.
The funding is to provide incentive/reward for me to do this project - I've been thinking about this project for a while but haven't been able to prioritize it.
At minimum funding valuation, I anticipate being able to collect data for 2-3 of the platforms listed above and run broad analyses without too many cross-tabs.
With higher funding valuation, I expect to personally spend more time on the project and/or bring on a collaborator to extend the project further (I already am discussing this project with them and have worked with them on other forecasting projects). I anticipate collecting data for more platforms and diving deeper into some questions, e.g.
Investigating how well different platforms perform at different time horizons before question resolution. E.g. I would expect to observe the impact of discount rates on prediction markets.
Analyzing predictive performance on different question categories, e.g. politics vs sports.
Analyzing the impact of number of forecasters and liquidity (for prediction markets) and controlling for them (e.g. comparing questions with similar numbers of forecasters).