Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
2

Virtue-Ethical Rationality and Training Dynamics

Peli-Grietzer avatar

Peli Grietzer

ActiveGrant
$20,000raised
$30,000funding goal

Donate

Sign in to donate

In 'On Eudaimonia and Optimization' I argue that the concept of eudaimonia -- active, rational flourishing -- points to a framework for rational action that differs from standard instrumental planning. In a word, I relate eudaimonic action to commitment to 'promoting x x-ingly' (promoting peace peacefully, promoting mathematics mathematically, promoting democracy democratically), and argue that x-s that can materially sustain such a commitment are an important type of natural abstraction.

When turning to AI alignment proper, I argue that:

  1. Human values and human ideas of prosocial behavior (what Joe Carlsmith calls 'being nicer than Clippy') become easier to operationalize when considered as eudaimonic practices

  2. The 'eudaimonic practices' framework is attuned to RL and RL-related dynamics at multiple scales, from the post-training of frontier models to human sociotechnical dynamics

  3. The prospects for instilling AIs with narrowly safety-focussed values such as 'transparency' and possibly 'corrigibility' improve when we approach these value as analogous to eudaimonic practice

The purpose of the current project (3-4 months) is to cash out these claims in detail, begin to substantiate them empirically, and leverage the resulting analysis into testable guidelines for LLM post-training. Funding beyond $10k will go partly towards compute and partly towards freeing additional working hours. Funding beyond $20k will go towards ML engineer hours and compute.

Comments3Donations2
Austin avatar

Austin Chen

6 days ago

Approving this project. As a sidenote, I'm very grateful for Gavin's clear writeup explaining why he wanted to fund this!

donated $10,000
gleech avatar

Gavin Leech

7 days ago

Peli is a frugal philosopher who cowrote one of my favourite essays on alignment.

The idea: He wants to invent virtue post-training, inspired by e.g. the actual normative practice of fields like maths, which often appeals to self-instantiation. It's an open question whether current capabilities allow for instilling stable loops like this and I'm glad someone is trying.

Counterfactual: The current project struck out from the usual funders, I guess because theory is de-emphasised now, or because virtue ethics is usually unoperationalisable, or because he's not good at sales, or because they don't know the following about him.

Track record: Besides the (great) above essay: I've worked with Peli on two technical ML research papers and was impressed with his experimental skill, design skill, and precision. He also did some invited technical replication work on Turner 2023. He's competent at ML experiments at the level of a decent PhD student in the field. He is very used to working totally independently from inception to .

Concerns: If I didn't know the above, the doc would worry me in how totally non-ML-technical it is (it's intentionally doing some original philosophical work as prerequisite for that part). He's also been looking for collaborators and not finding much success; I hope that giving him independent funding makes this search an easier pitch.

I expect distribution to be the weakest part of the project. The inferential distance might be too high for the narrower part of the ingroup audience to bridge to him, even with him presenting them good data. But places like PAW, AIES and HAAISS will certainly engage. He's capable of doing the conference paper grind but doesn't seem motivated by it. Maybe a collaborator could bring the will to actively disseminate the results. But if distribution is the worst risk for a speculative ambitious project then we're in a good spot.

Cost-effectiveness: Very high, $40k / FTE for taking an intriguing idea and bringing it to testability.

Conflict of interest: As noted, Peli has worked on several projects at my company Arb and I've known him for years on Twitter.

Peli-Grietzer avatar

Peli Grietzer

6 days ago

@gleech I actually wouldn't say I've had trouble drawing collaborators. Kenny Easwaran and Tan Zhi-Xuan are on board as volunteer decision-theory advisers, and I haven't recruited for ML engineer collaborators yet since I'll be playing that by ear and budget.