I've supervised Javier & Oscar for a paper before, and they've done great work. I think this project seems promising - to my knowledge there's been little work on long-form hallucination detection, I expect there's low hanging fruit here, and the results seem decent so far. This is a fairly small grant, and will help unblock the project, so feels like a no-brainer to me
I think that detecting hallucinations in long-form content is a good downstream task to improve monitoring methods on (and find insights that translate to higher stakes tasks like control or deception), in addition to being near-term useful for making models safer. Improving factuality is likely to also help with making systems more profitable, which is slightly accelerationist, but I am generally pro work that legibly demonstrates how safety is incentivised by profit-seeking.
CoI Note: I am a (very hands off) supervisor for this project and will likely be a co-author on the paper resulting from this. But my quality bar for spending time on a project is notably higher than my bar for regrants, so I don't feel too concerned