I recently submitted a paper based on this research to an ML conference, wrapping up the project. There is no public version of the paper yet, but one will be released after the first acceptance/rejection decision, and an Alignment Forum post covering the topic will be released within two weeks.
The main results of this project are as follows:
- Formalized the preliminary results, including streamlining proofs and removing assumptions
- Demonstrated that the results hold for decisions based on average prediction, not just the most preferred prediction (as shown initially), which makes the process much easier to implement with a human decision maker
- Showed that it is possible to elicit honest predictions for all actions, not only the one actually chosen
- Proved uniqueness of the zero-sum setup for incentivizing the desired behavior
- Showed that the space of actions can be searched for the optimal action in O(1) time, not just O(log(n)) as per the preliminary result
- Avoided the cost of training additional models to implement zero-sum competition by instead using multiple dropout masks
- In the first major experiment, showed that the zero-sum setup avoids performative prediction, even in an environment that incentivizes it
- In the second major experiment, showed that the zero-sum setup trains performative prediction out of a model faster and more extensively compared to a stop-gradient through the choice of action
- Ran various robustness checks, including showing that the results hold even if predictors had access to different information
- Showed that decision markets can also be structured to avoid performative prediction (this result was cut from the submitted paper for space)
The only result that I was hoping to produce that was not accomplished was showing extending the mechanism to cases where different predictors have private information. However, this is much less urgent if it is being implemented as two masks of the same model, which have access to identical information. The experiments showed that private information does not make difference in the toy model. It is possible that I can develop a theoretical solution to private information in future work, but after having worked on the problem extensively I believe such a solution is unlikely to exist, at least without making unrealistic further assumptions.
Overall, I'm happy with the outcome of this project. While there is still room for follow-up work, I believe it presents a first-pass solution to the problem of performative prediction. Through the course of working on this project, I have also come to believe that being able to elicit honest conditional predictions will have further applications to safety beyond performative prediction, especially with respect to online training and myopia.
I will also address that this project took considerably longer than expected to complete. I had hoped to have SPAR mentees implement the experiments, but was unable to generate useful work from them. I consider this setback entirely my own fault, as I should not have counted on volunteer, part-time labor, especially for work beyond what I could implement myself. After deciding to the experiments myself, I took time to build up the necessary background, and so I do not anticipate this being a bottleneck in future work. A secondary reason for the delay is that I returned to my PhD after the first four months, which reintroduced other demands on my time.
This project was completed solo, although I benefited from discussions with Johannes Treutlein, editing from Simon Marshall, and code review from Dan Valentine.