You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
I am developing a paper titled “Semantic Implementation of VCG Mechanisms in LLM-Mediated Labor Markets.”
The paper studies a simple but important question: when a formal market mechanism is implemented through a language-model interface, do the classical guarantees of the mechanism survive?
In my current OpenAI-side experiments, a 1000-episode gpt-4o-mini labor-market simulation shows that structured-output VCG achieves mean welfare capture of 0.683, barely above random matching at 0.678, despite zero parse failures and zero fallbacks. Diagnostics show that the failure is not syntactic: the model produces valid numerical reports, but those reports sometimes falsely exclude true-positive firm-worker edges from the reported-positive graph, causing under-trading.
A 300-episode numerical anchoring ablation restores full welfare capture for gpt-4o-mini, and a 300-episode OpenAI model-family audit shows that gpt-4.1-nano, gpt-4.1-mini, gpt-4.1, and gpt-4o all achieve full welfare capture under the standard structured channel. This supports the paper’s core claim: the failure is model–prompt–parser dependent, not a refutation of VCG or a universal LLM failure.
The remaining step is cross-provider validation. I am requesting a small API budget to run the same pre-registered diagnostic on Anthropic Claude Sonnet and Google Gemini Flash
The goal is to complete a cross-provider semantic-channel audit for LLM-mediated direct-revelation mechanisms.
The specific goals are:
- Run a 300-episode Claude Sonnet diagnostic using the same four-arm design:
- oracle truthful VCG;
- random matching;
- standard structured VCG;
- numerically anchored structured VCG.
- Run a 300-episode Gemini Flash diagnostic using the same four-arm design.
- Compare welfare capture, exact-report rates, false-excluded true-positive edges, under-trading, parse/fallback rates, and absolute welfare loss across OpenAI, Anthropic, and Gemini channels.
- Update the paper with a cross-provider model-family audit section.
- Prepare a reproducibility package containing code, raw logs, analysis scripts, figures, tables, and manifests.
The codebase, OpenAI experiments, zero-cost diagnostics, and analysis pipeline are already built. The remaining work is mainly API execution, provider integration, diagnostics, and paper integration.
Minimum funding requested: $500.
This would be used for:
- Anthropic Claude Sonnet API credits for a 300-episode cross-provider run.
- Google Gemini Flash API credits for a 300-episode cross-provider run.
- Smoke tests and failed-run buffer.
- Re-running incomplete chunks if logging, schema validation, or parsing metadata fail.
- Reproducibility packaging and final paper integration.
The minimum viable version of this project is the Claude + Gemini 300-episode cross-provider audit. If costs are lower than expected, leftover funds will be used for one small robustness extension, such as a larger-market 5x5 diagnostic or an additional 300-episode confirmation run.
I am currently the sole researcher on this project.
Work already completed:
- Built a corrected simulation codebase for LLM-mediated labor-market mechanisms.
- Completed the main 1000-episode gpt-4o-mini run.
- Completed zero-cost diagnostics from existing logs.
- Completed false-exclusion and under-trading decomposition.
- Completed a 300-episode numerical anchoring ablation.
- Completed a 300-episode OpenAI model-family audit for gpt-4.1-nano, gpt-4.1-mini, gpt-4.1, and gpt-4o.
- Drafted the working paper with theory, diagnostics, figures, and reproducibility appendices.
I am an incoming Mathematics and Statistics student at the University of Warwick. This project is currently being developed as an independent research paper.
The main risks are:
- Anthropic or Gemini API integration may take longer than expected.
- Provider APIs may not support the exact same structured-output interface, requiring careful adaptation.
- Cross-provider results may be ambiguous or not replicate the OpenAI pattern.
- The 3x3 labor-market setting may be considered too small for a journal-level claim without further robustness.
- The final paper may still need an experienced coauthor or advisor before journal submission.
If the project fails technically, the fallback outcome is still useful: the OpenAI-side paper is already complete, and any failed cross-provider attempt will be documented as part of the implementation/reproducibility record.
If the cross-provider results are mixed or negative, I will not overclaim. The paper will report that semantic-channel behavior is provider- and interface-dependent, and that broader validation remains necessary.
I have not raised external research funding in the last 12 months. The work so far has been self-funded, including OpenAI API experiments, smoke tests, failed runs, and analysis.
There are no bids on this project.