As an update on this project: I suggested that Rob collaborate with two of my MATS scholars Ivan and Jett to combine a bunch of results on CoT unfaithfulness into an ICML submission. We submitted a paper, and are now tweaking it before publishing on arXiv. Anyone who reaches out to me (<first name><second name>@gmail.com) can be sent a draft.
This has meant we've fallen behind schedule for releasing a dataset here. We plan to open source our findings when we submit to arXiv, but will eventually do this, I just personally feel that landing an impactful paper is more important than rushing out early data. Happy to have push back on this!