Red Set ProtoCell: An Open-Source Immune System for AI Models

Project summary

Red Set ProtoCell is an open-source technical AI safety platform designed to systematically identify, score, and mitigate failure modes in large language models.

The project treats unsafe model behavior as a biological problem: models develop vulnerabilities, adversarial behaviors act like pathogens, and safety tooling functions as an immune system. Red Set ProtoCell operationalizes this metaphor into a concrete, testable framework for red-teaming, auditing, and risk evaluation.

The platform is already live with a working backend and frontend, and this grant would support moving it from a functional prototype to a reliable research and evaluation tool usable by safety researchers, labs, and auditors.

What are this project's goals? How will you achieve them?

Goals

• Build a practical, open-source red-teaming framework for LLMs that goes beyond prompt hacking

• Enable repeatable, auditable safety evaluations using a dual-agent architecture

• Lower the barrier for independent AI safety research and evaluation

• Provide a transparent alternative to closed, internal red-teaming tools

How

• Implement a dual-agent system where one agent actively probes models for failure modes while another scores, classifies, and logs safety-relevant behavior

• Expand the current scoring and evaluation logic to cover misuse, deception, hallucination, instruction-following failures, and policy evasion

• Improve the UI to allow non-developers to run evaluations and inspect results

• Release clear documentation and example evaluation suites so others can reproduce results

This project is explicitly scoped to near-term, testable safety work rather than speculative alignment theory.

How will this funding be used?

Funding will be used for:

• Development time to stabilize and extend the evaluation and scoring pipeline

• Improving frontend usability and reliability for evaluators

• Writing technical documentation and example safety benchmarks

• Limited infrastructure costs for hosting, testing, and CI

• Open-source maintenance and community onboarding

No funds will be used for lobbying, marketing, or proprietary development.

Who is on your team? What's your track record on similar projects?

The project is currently led by a single developer and AI safety researcher operating under LA Builds.

Relevant track record includes:

• Designing and deploying the current Red Set ProtoCell backend and frontend

• Experience building modular AI systems using Python, Flask, React, and TypeScript

• Prior work focused on AI safety tooling, adversarial testing, and system robustness

• Demonstrated ability to ship working systems independently under constrained resources

The project is designed to be contributor-friendly and to grow into a multi-contributor open-source effort.

What are the most likely causes and outcomes if this project fails?

Likely causes

• Insufficient time and resources to polish the tool into something others can easily adopt

• Lack of visibility within the AI safety research community

• Competing priorities limiting development velocity

Outcomes

If the project fails, the result is primarily opportunity cost rather than harm. The codebase would remain open-source and available, and the lessons learned about practical red-teaming architectures would still be valuable to future safety work.

There is no plausible pathway where this project meaningfully increases AI risk.

How much money have you raised in the last 12 months, and from where?

$0.

This project has been self-funded to date.