Φ-Arena and adversarial multi-agent VLA interpretability

Project summary. Adversarial multi-agent embodied control is the regime where two or more physical AIs optimize against each other in shared physical space — a robot sharing a sidewalk with another robot, two drones contesting the same airspace, an agent guarding a resource against an agent that wants it. The single-agent VLA benchmarks (LIBERO, RoboCasa, VLA-Risk) measure none of it. The symbolic multi-agent benchmarks (Melting Pot, PettingZoo, MARL-Lib) skip the physical embodiment. The intersection has no public substrate. We think it should — and that the answers will look weirder than the field expects.

Φ(fight) Research is a two-person independent collective founded to build that substrate and study the first questions it raises. We're proposing three papers as the kickstart series, all targeting ICLR 2027 (September 2026 deadline).

The three papers.

Φ-Arena (co-led). Open benchmark for VLA-vs-VLA evaluation across MuJoCo, Isaac Sim, and Genesis. Standardized opponent-conditioned protocols. Energy-bounded constraint regimes. Matchup table of OpenVLA, OpenVLA-OFT, π₀, SmolVLA. Released open-source alongside a HuggingFace dataset of rollout traces.

Mechinterp of self-play policies (Liu Yuchen lead). Circuit-level analysis of exploit-prone subnetworks emerging during adversarial self-play training. Extends activation-intervention methods (matched-random ablation, bypass testing, attention-partition diagnostics) and sparse-autoencoder approaches (per SAEBench, ICML 2025) from static language models to dynamic embodied policies.

Energy-bounded adversarial games (Han Muchen lead). Empirical characterization of how hard physical resource constraints (torque budget, battery, episode length) reshape emergent strategy distributions in self-play, compared to unbounded regimes.

The unit of work is papers, submitted to ICLR 2027 with all code, models, and datasets released open-source at submission time.

Long horizon. Φ is built for a ten-year arc. The three papers above are the kickstart series, not the program — the broader agenda (adversarial robustness of VLA and world models, world models under adversarial dynamics, sample-efficient self-play, cross-embodiment adversarial generalization) lives at https://φ.monster. This $15K catalyzes the first six months. The substrate (Φ-Arena), the methodology toolkit (activation intervention, opponent-conditioned eval harness), and the rollout-trace dataset persist past it — and external groups extending them is the actual long-horizon bet.

Success metric. At least three external research groups extend or build on Φ-Arena within six months of public release.

How funding used. $7,000 — compute beyond Tier 0 grants. $3,500 — ICLR 2027 conference attendance. $2,500 — open-source infrastructure (HuggingFace Pro, leaderboard backend, storage). $2,000 — operational (Wyoming LLC annual report, Form 1065 first-year CPA, domain renewal). Total $15,000. $5K minimum funds the Φ-Arena population phase (the substrate the other two papers depend on). $10K adds the mechinterp paper. $15K full funds three papers plus ICLR travel.

Team.

Liu Yuchen (founder). HKUST EE+AI sophomore. Four papers under double-blind review at NeurIPS 2026 — three sole-author, one co-first — all on mechanistic interpretability and methodology for VLA models. The activation-intervention toolkit Φ depends on, and the closed-loop LIBERO/MuJoCo rollout infrastructure with Wilson CI / cluster bootstrap validation, were built and shipped in those four papers. Review-safe paper descriptions and CV at https://lyrica.φ.monster. Before research: built Squirrel (Rust + MCP AI memory layer for coding agents, ~1K stars Feb 2026) and OfferI (AI study-abroad agent startup, Feb–Dec 2025, shut down). Archived both when I pivoted to research full-time.

Han Muchen (founding researcher). HKUST sophomore. Prior research at the Division of Social Science with Prof. Janet Hui-wen Hsiao on cognitive-science approaches to AI / explainable AI (EMHMM with deep learning).

Φ public infrastructure (live as of May 2026): https://φ.monster · https://github.com/phi-monster · https://huggingface.co/phi-monster

Verification. Anonymized NeurIPS 2026 submission packages (paper + code + results) available on request — email yuchen@φ.monster.

Failure modes. Most likely failure: schedule. Two part-time student founders shipping three papers in four months is tight; HKUST coursework eats calendar. If we ship two of three, we drop the energy-bounded paper (most flexible scope) and ship Φ-Arena and the mechinterp paper. If we ship zero, we release the partial benchmark infrastructure (simulator adapters, opponent-conditioned eval harness) regardless; the funding still produces a public artifact.

Other risks: compute overrun (mitigation: scope to two-VLA matchup, plan v2 expansion); self-play training instability (mitigation: fall back to fixed opponent pool sampling); ICLR rejection (resubmit to NeurIPS 2027 or ICML 2027; substrate releases publicly at submission regardless of acceptance).

Money raised in last 12 months. $0. Φ(fight) Research was founded May 2026. Wyoming LLC (Phi Fight Research LLC) filed 2026-05-16, pending SOS approval (active by 2026-05-27). Founders self-funded as HKUST students. Manifund is the first cash funding pursued.

Tier 0 compute grants in flight (both submitted 2026-05-16): TPU Research Cloud (free TPU credit, not cash); Lambda Research Grant (up to $5K compute credit, not cash).

Disbursement: recipient is Phi Fight Research LLC. EIN by Form SS-4 fax once LLC active; Mercury bank account active mid-June 2026; wire instructions provided to Manifund once Mercury is active. Tax classification: partnership (multi-member LLC default). FinCEN BOI Report not required (2025-03-26 IFR exemption for U.S.-formed LLCs).

Φ-Arena and adversarial multi-agent VLA interpretability

Donate