Sudarsh Kunnavakkam

@skunnavakkam

https://sudarsh.com

$0total balance

$0charity balance

$0cash balance

$0 in pending offers

About Me

I'm Sudarsh. I'm a first year at Caltech studying Physics. I also do research as a contractor at METR, work in the Shapiro Lab at Caltech, and other things. Please reach out if you'd like, or check out my website at https://sudarsh.com :D

Projects

Tooling + Model Orgs for CoT Faithfulness Research

Outgoing donations

Marius Hobbhahn

$2000

3 months ago

Comments

Tooling + Model Orgs for CoT Faithfulness Research

Sudarsh Kunnavakkam

12 months ago

@NeelNanda i'm doing this by training against a monitor (so negatively rewarding when the monitor fires). the reasoning traces definitely have higher reward (fires the monitor less) and look by eye to be more obfuscated, but i think this "look by eye" is likely confirmation bias. i think continuing training from these checkpoints would work!

Transactions

For	Date	Type	Amount
<9af369f0-dc1e-4577-9bb7-4ad4cb87131f>	3 months ago	profile donation	2000
Sudarsh Kunnavakkam	3 months ago	cash to charity transfer	2000
Manifund Bank	12 months ago	withdraw	1000
Tooling + Model Orgs for CoT Faithfulness Research	12 months ago	project donation	+3000