I use this code base, it replicates, and is a low overhead environment to study reward hacking - which means it speed up research iterations.
@wassname
Independent Alignment Researcher
https://wassname.com$0 in pending offers
Michael Clark
6 days ago
I use this code base, it replicates, and is a low overhead environment to study reward hacking - which means it speed up research iterations.