Virtue-Ethical Rationality and Training Dynamics

In 'On Eudaimonia and Optimization' I argue that the concept of eudaimonia -- active, rational flourishing -- points to a framework for rational action that differs from standard instrumental planning. In a word, I relate eudaimonic action to commitment to 'promoting x x-ingly' (promoting peace peacefully, promoting mathematics mathematically, promoting democracy democratically), and argue that x-s that can materially sustain such a commitment are an important type of natural abstraction.

When turning to AI alignment proper, I argue that:

Human values and human ideas of prosocial behavior (what Joe Carlsmith calls 'being nicer than Clippy') become easier to operationalize when considered as eudaimonic practices
The 'eudaimonic practices' framework is attuned to RL and RL-related dynamics at multiple scales, from the post-training of frontier models to human sociotechnical dynamics
The prospects for instilling AIs with narrowly safety-focussed values such as 'transparency' and possibly 'corrigibility' improve when we approach these value as analogous to eudaimonic practice

The purpose of the current project (3-4 months) is to cash out these claims in detail, begin to substantiate them empirically, and leverage the resulting analysis into testable guidelines for LLM post-training. Funding beyond $10k will go partly towards compute and partly towards freeing additional working hours. Funding beyond $20k will go towards ML engineer hours and compute.

Virtue-Ethical Rationality and Training Dynamics

Donate