Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
wassname avatarwassname avatar
Michael Clark

@wassname

Independent Alignment Researcher

https://wassname.com
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

Comments

Mitigating Reward Hacking Through RL Training Interventions
wassname avatar

Michael Clark

6 days ago

I use this code base, it replicates, and is a low overhead environment to study reward hacking - which means it speed up research iterations.