You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Project Summary
Lunaris MoC (Mixture of Collaboration) is a new open-source architecture for sparse language models. Standard Mixture-of-Experts (MoE) models route tokens to independent experts that never communicate. MoC introduces a mediator hub enabling O(K) bidirectional information flow between experts, combined with learned per-token adaptive compute depth.
Initial experiments at 64M parameters show MoC achieving 6.2% better perplexity than MoE with identical parameter counts. This project seeks funding to validate these results at 1-2B parameter scale with rigorous ablations.
Repository: https://github.com/Auren-Research/lunaris
Goals and Approach
Primary goal: Determine whether MoC's advantages over MoE hold at 1B+ parameter scale through controlled experiments.
Specific deliverables:
- Train 4 models at 1B parameters, 10-15B tokens each:
- MoE baseline (standard top-2 routing, no collaboration)
- MoC complete (mediator + expert feedback + iterative reasoning)
- Ablation: IRL-only (iterative reasoning without collaboration)
- Ablation: Collaboration-only (mediator without extra reasoning steps)
- Publish all results, configs, training logs, and model weights as open source
- Write and release a technical report documenting methodology, results, and analysis
- Release trained models on HuggingFace for the community to evaluate
How: Rent a single B200 or equivalent GPU on RunPod/Lambda, run each experiment for ~5-7 days, compare validation loss, perplexity, and routing behavior across all configurations.
How Funding Will Be Used
100% of funding goes to GPU compute. Breakdown at current cloud pricing:
| $500 (minimum) | 2 runs at 500M params (MoC + MoE baseline), ~10B tokens each. Proves or disproves the architecture at 8× the scale of current experiments. |
| $1,500 | 2 runs at 1B params (MoC + MoE), ~10B tokens each. Meaningful scale with head-to-head comparison. |
| $3,000 | 4 runs at 1B params (MoC + MoE + 2 ablations), ~15B tokens each. Full ablation suite isolating which components drive improvements. |
| $5,000 (goal) | Full suite at 1B + exploratory run at 2B params. Strongest possible evidence for or against the architecture. |
No funding goes to salary, equipment, or overhead. I already have the code, the data pipeline, and the experimental methodology. The only bottleneck is compute.
Team and Track Record
Solo project by Francisco Antonio (17, Brazil).
- Designed and implemented the full MoC architecture from scratch (~1,500 lines of PyTorch)
- Built a complete training pipeline with routing diagnostics, adaptive compute tracking, gradient checkpointing, and wandb integration
- Ran 4 controlled experiments comparing vanilla/MoE/MoC architectures on a single A10 GPU, spending $8.50 of cloud credits
- Curated a 150B-token pretraining dataset (FineWeb-Edu + FineMath + Stack-Edu), published on HuggingFace
- Active on HackerOne with verified security research for blockchain companies
- All work is public and verifiable on(https://github.com/Auren-Research/lunaris)
I have no institutional affiliation, no advisor, and no prior funding. Everything built so far was done independently with personal resources.
If This Project Fails
Most likely failure mode: MoC shows no significant advantage over MoE at 1B scale. The improvements seen at 64M were due to small-scale dynamics that don't transfer.
What happens then:
- All results (positive or negative) are published openly — negative results have scientific value
- The training infrastructure, dataset, and codebase remain useful for the community
- The ablation data reveals which components (mediator, feedback, adaptive compute) work and which don't, informing future architecture research
- I continue iterating on the architecture based on what the data shows
Lower probability failure: Training instability at scale (loss spikes, divergence). Mitigated by QK-norm, bounded gates throughout the architecture, and the ability to resume from checkpoints.
This is not an all-or-nothing bet. Even a partial result (e.g., "collaboration helps but adaptive compute doesn't") is publishable and valuable.
Funding History
$0 raised in the last 12 months.
No grants, no sponsorships, no investments. The $8.50 spent on initial experiments came from personal funds (the last of $10.99 in Lambda cloud credits). Currently awaiting responses from NLnet, Thiel Fellowship, and Emergent Ventures.
There are no bids on this project.