🍉

Cadenza Labs

@cadenza_labs

We are a new AI Safety Org, focusing on Conceptual Interpretability

https://cadenzalabs.org/

$0total balance

$0charity balance

$0cash balance

$0 in pending offers

About Me

The goal of our group is to do research which contributes to solving AI alignment. Broadly, we of course aim to work on whatever technical alignment projects have the highest expected value. Our current best ideas for research directions to pursue are in interpretability. More about our research agenda can be found here.

Projects

Cadenza Labs: AI Safety research group working on own interpretability agenda

Comments

Cadenza Labs: AI Safety research group working on own interpretability agenda

🍉

Cadenza Labs

11 months ago

Final report

Description of subprojects and results, including major changes from the original proposal

We mostly worked on a paper studying ways to improve unsupervised probing methods via clustering. Our paper got accepted to the MechInterp workshop at ICML. We have also submitted the paper to a top conference, and it is under review now.
SPAR mentor in 2024 spring iteration. We have been working on a project where we were using probing methods to elicit the value of state inside policy and value networks in reinforcement learning. Furthermore, three SPAR students also worked on the above paper mentioned in 1.
In addition, our lead researcher has been involved in multiple projects which also got accepted to the MechInterp workshop here and here

Spending breakdown

Since we got only a relatively small part, we were only able to cover costs during for our stay at FAR Labs in Berkeley for two people (flights, accommodation, food, etc.). They invited us for their team-in-residency program, where we worked mostly on the above things. Some costs were also used for compute, and the ICML conference.

Transactions

For	Date	Type	Amount
Manifund Bank	11 months ago	withdraw	1
Manifund Bank	11 months ago	withdraw	7809
Cadenza Labs: AI Safety research group working on own interpretability agenda	about 1 year ago	project donation	+100
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+5000
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+100
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+10
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+100
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+790
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+1000
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+210
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+500

Comments

Cadenza Labs: AI Safety research group working on own interpretability agenda

🍉

Cadenza Labs

11 months ago

Final report

Description of subprojects and results, including major changes from the original proposal

We mostly worked on a paper studying ways to improve unsupervised probing methods via clustering. Our paper got accepted to the MechInterp workshop at ICML. We have also submitted the paper to a top conference, and it is under review now.
SPAR mentor in 2024 spring iteration. We have been working on a project where we were using probing methods to elicit the value of state inside policy and value networks in reinforcement learning. Furthermore, three SPAR students also worked on the above paper mentioned in 1.
In addition, our lead researcher has been involved in multiple projects which also got accepted to the MechInterp workshop here and here

Spending breakdown

Transactions

For	Date	Type	Amount
Manifund Bank	11 months ago	withdraw	1
Manifund Bank	11 months ago	withdraw	7809
Cadenza Labs: AI Safety research group working on own interpretability agenda	about 1 year ago	project donation	+100
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+5000
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+100
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+10
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+100
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+790
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+1000
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+210
Cadenza Labs: AI Safety research group working on own interpretability agenda	over 1 year ago	project donation	+500