Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
skunnavakkam avatarskunnavakkam avatar
Sudarsh Kunnavakkam

@skunnavakkam

https://sudarsh.com
$2,000total balance
$0charity balance
$2,000cash balance

$0 in pending offers

About Me

I'm Sudarsh. I'm a first year at Caltech studying Physics. I also do research as a contractor at METR, work in the Shapiro Lab at Caltech, and other things. Please reach out if you'd like, or check out my website at https://sudarsh.com :D

Projects

Tooling + Model Orgs for CoT Faithfulness Research

Comments

Tooling + Model Orgs for CoT Faithfulness Research
skunnavakkam avatar

Sudarsh Kunnavakkam

17 days ago

@NeelNanda i'm doing this by training against a monitor (so negatively rewarding when the monitor fires). the reasoning traces definitely have higher reward (fires the monitor less) and look by eye to be more obfuscated, but i think this "look by eye" is likely confirmation bias. i think continuing training from these checkpoints would work!

Transactions

ForDateTypeAmount
Manifund Bankabout 18 hours agowithdraw1000
Tooling + Model Orgs for CoT Faithfulness Research3 days agoproject donation+3000