You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
The project treats unsafe model behavior as a biological problem: models develop vulnerabilities, adversarial behaviors act like pathogens, and safety tooling functions as an immune system. Red Set ProtoCell operationalizes this metaphor into a concrete, testable framework for red-teaming, auditing, and risk evaluation.
The platform is already live with a working backend and frontend, and this grant would support moving it from a functional prototype to a reliable research and evaluation tool usable by safety researchers, labs, and auditors.
Goals
• Build a practical, open-source red-teaming framework for LLMs that goes beyond prompt hacking
• Enable repeatable, auditable safety evaluations using a dual-agent architecture
• Lower the barrier for independent AI safety research and evaluation
• Provide a transparent alternative to closed, internal red-teaming tools
How
• Implement a dual-agent system where one agent actively probes models for failure modes while another scores, classifies, and logs safety-relevant behavior
• Expand the current scoring and evaluation logic to cover misuse, deception, hallucination, instruction-following failures, and policy evasion
• Improve the UI to allow non-developers to run evaluations and inspect results
• Release clear documentation and example evaluation suites so others can reproduce results
This project is explicitly scoped to near-term, testable safety work rather than speculative alignment theory.
• Development time to stabilize and extend the evaluation and scoring pipeline
• Improving frontend usability and reliability for evaluators
• Writing technical documentation and example safety benchmarks
• Limited infrastructure costs for hosting, testing, and CI
• Open-source maintenance and community onboarding
No funds will be used for lobbying, marketing, or proprietary development.
Who is on your team? What's your track record on similar projects?
The project is currently led by a single developer and AI safety researcher operating under LA Builds.
Relevant track record includes:
• Designing and deploying the current Red Set ProtoCell backend and frontend
• Experience building modular AI systems using Python, Flask, React, and TypeScript
• Prior work focused on AI safety tooling, adversarial testing, and system robustness
• Demonstrated ability to ship working systems independently under constrained resources
The project is designed to be contributor-friendly and to grow into a multi-contributor open-source effort.
Likely causes
• Insufficient time and resources to polish the tool into something others can easily adopt
• Lack of visibility within the AI safety research community
• Competing priorities limiting development velocity
Outcomes
If the project fails, the result is primarily opportunity cost rather than harm. The codebase would remain open-source and available, and the lessons learned about practical red-teaming architectures would still be valuable to future safety work.
There is no plausible pathway where this project meaningfully increases AI risk.
$0.
This project has been self-funded to date.
There are no bids on this project.