LLM Multi-Actor Tool to automate Economic Experiments

Longer description of your proposed project

Goal of Project: Automating Economic Experiments with a LLM Multi-Actor tool

Any questions about this project are welcome @ rvbergem@gmail.com

Long-run outcome in case of success:

1) Drastically lower the cost of (controlled) economic experimentation

2) Automated testing of novel governance mechanisms with multiple 'agentized' LLM instances that are tuned to simulate specific types of behavior, under digitally controlled conditions.

Context:

Large Language Models (LLMs) are implicit computational models of humans. 'They' can be given endowments, information, and preferences in a simulated economic scenario to study their behavior. Although it is known that LLMs can play simple economic experiments, it is unclear how complex the scenario can be made and how human-like the LLM's behavior will be in such situations.

We want to build a Multi-Actor LLM tool that allows researchers and developers to simulate the behavior of different, realistic types of agents and their interactions in more complex economic scenarios.

Improvements over traditional Economic Experimentation:

If our Multi-Actor LLM tool is able to successfully facilitate increasingly complicated economic simulations, it will have profound consequences.

Currently 'real life' economic experimentation to test novel governance mechanisms has the following downsides;

1) The costs of running economic experiments are often high. A typical economic lab experiment costs between $5.000 and $10.000. Human participants need to be compensated for their time and performance. The physical lab itself requires infrastructure as well as the time of researchers running it.

2) Human participants need to be recruited and play the experiments under controlled conditions in a physical lab. The time to run controlled experiments testing one novel governance mechanism - involving hundreds of students- takes often several months.

3) Human participants (often university students) need time to adjust to the simulated economic environment through instruction, which puts a limit on how complex the economic lab experiment can be.

Our Multi-Actor LLM tool aims to automate the economic (lab) experiments. If successful, we will be able to reduce the cost of experimentation significantly. Potentially experimentation can be reduced to the cost of the required tokens. The months it takes to run 'real life' experiments can potentially be reduced to mere seconds.

The ability to easily spin up multiple LLM 'agents' that play our economic games allows for LLM agent fine tuning. We will be able to add specific types of behavioral agents through fine-tuned models, LORA-adapted models, or behavioral prompts. That allows for a wider range of testing options than the student based tests that are now standard in experimental labs. It would be possible, for instance, to train some agents using reinforcement learning and pit them against agents without that training to see if the latter can be exploited. Such tests will tell us what governance mechanisms pose high risks for new users for instance.

Approach :

One of our real life lab experiments tests Robin Hanson’s concept of Futarchy and the concept of Harberger taxation, applied to the problem of spatial planning. We choose this experiment given it’s relatively high complexity for a lab experiment. The experiment provides us with real life human behavioral data and data on the efficiency of the Futarchy and Harberger mechanisms that we can replicate with our Multi-Actor LLM tool.

Although the LLM space is rapidly changing, the current plan is to leverage the OpenAI's API to build a ‘communication layer’ that interacts with our server where our spatial planning game software is hosted. The idea is to have an individual ‘LLM instances’ take each of the roles in the experimental simulation; 6 speculators, 5 owners, and 1 developer.

As is the case in our real experiment, these different players will be given different endowments, preferences, and information during the course of the game. The server can prompt each player instance with the information it requires and ask for a choice in the experiment. The choice can be fed back to the server via a specified JSON format. The server can use these JSON inputs to determine intermediate results, leading to new prompts with the updated game state and request for input. This process will continue until the game is completed.

This will create a dataset that has the same form as the data obtained from our real life lab experiments. Through alterations in the way we prompt the LLM-agents, as well as by fine-tuning the LLM-agents behavior, we will try to get as close as possible to the results as obtained in the lab.

Finally, we can employ reinforcement learning algorithms and repeated play to see how governance mechanisms fare when agents learn to use the environment better. In the latter sense our tool would be suitable for testing whether various governance structures are robust political economic designs.

Expected Result:

Having gone through these steps, we have a multi-actor LLM tool that is tested in a complex economic game. The communication layer we will build can be adapted to any multi-actor economic game hosted on a server. A longer-run goal is to have a separate LLM instances re-design the economic rules on the server side, having the assignment to optimize the rules to achieve some specified social utility function. The latter version of our tool would resemble an evolutionary governance simulator.

Describe why you think you're qualified to work on this

We are a group of 3 that have a track record of working together over 2 years in various projects. Sander Renes (SR) and Rutger van Bergem (RvB) work together as economists at the Technical University of Delft (TU-Delft) where they co-teach courses in Micro-Economics, Economics of Infrastructures and Institutional Design in Complex Systems. We co-founded a behavioral and group decision lab (IBEX-Lab) at the TU-Delft in which they test novels Governance Designs under controlled conditions.

Yary Ribero (YR) and RvB have worked together on a startup project that aimed to build a algorithmic stabletoken currency board with ‘crawling peg’ for an e-commerce platform in Kenya. The project failed because the counterparty pulled the plug; overall it failed because of the lack of market fit.

SR, RvB and YR (the applying team) have been working together over a year building the software that facilitates the “Futarchy and Harberger for Spatial Planning” experiments that are currently ongoing in the IBEX lab. That software will be the infrastructure in which we test the LLM Multi-Actor tool initially.

Other ways I can learn about you

Our newly founded lab;

https://ibex.tudelft.nl/

The economic researchers;

https://www.linkedin.com/in/rutger-van-bergem54856a9b/

https://www.linkedin.com/in/sander-renes-45247360/

The Software development lead for this project

https://www.linkedin.com/in/yary-r-68759913/

How much money do you need?

$15.000 - $25.000 to build a LLM Multi Actor tool able to (1) connect to a browser based economic experiment and (2) spin up different LLM instances able communicate with each other and the server hosting the economic game.

The additional $10.000 will allow us to create more adaptations of the LLM agents resulting in several types of behavioral agents. Among which a type of agent that is maximally trained to exploit other game players within the same economic environment. Why do that? To ensure that novel governance designs are robust to maximally self-interested behavior.

Links to any supporting documents or information

https://docs.google.com/document/d/1c3PNgB1MgLhE4Rp61kC6mzkeFzjZE3PO/edit?usp=sharing&ouid=104134080185515447285&rtpof=true&sd=true

Estimate your probability of succeeding if you get the amount of money you asked for

Conditional on getting the $15.000- $25.000 to test our idea, we estimate a probability of 80% of being able to build the tool that can play the game, 2) 30% of being able to replicate the “spatial planning lab experiment” sufficiently to replicate human behavior in the experimental conditions and 3) 15% of being able to employ reinforcement learning to optimize LLM behavior in the simulated environment.