Hi everyone! Sorry for the ridiculously slow update.
This grant was absolutely clutch. I did indeed do more of category (1) of the work that Joel Becker recommends later, finishing a paper with Ethan Perez, "Towards Evaluating AI Systems for Moral Status Using Self-Reports" (2023).
That paper has since turned into a mini-research program. See more recently Binder et al. (2024) "Looking Inward: Language Models Can Learn About Themselves by Introspection".