Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Addressing Agentic AI Risks Induced by System Level Misalignment

Technical AI safetyGlobal catastrophic risks
cybersnacker avatar

Preeti Ravindra

ProposalGrant
Closes January 30th, 2026
$0raised
$1,000minimum funding
$4,000funding goal

Offer to donate

40 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

We aim to meet the moment by grounding AI safety research in messy, real-world deployments where agentic systems already operate and fail. While the broader community continues important work on eliciting model capabilities and red-teaming for failure modes, we are motivated by the lack of attention to the software engineering layer and the scarcity of actionable blue-team tools that help defenders address harms to agentic systems from external actors and reduce harms from agentic systems to the environment. Our work will deliver code/integrations with popular agentic frameworks with a secondary goal of a research paper.

What are this project's goals? How will you achieve them?

Problem: Most alignment work focuses on user level failures like jailbreaks or model level failures like reward hacking. However, agentic systems operate as software services with shell access, memory, and network privileges. All of this typically gets reduced to “scaffolding”, but these critical components for deploying and delivering AI agents also cause system-level risks. This creates bidirectional failure modes where
1. Lack of security controls lead to misalignment models

2.Misaligned models further exacerbate security and system-level risks like control subversion in insider threat scenarios and data exfiltration, privilege escalation 

Goal: Build defense mechanisms that can be democratized in the developer ecosystem where developers and deployers have a choice to choose safety and security instead of irresponsible deployments. We will execute two streams of work.

Stream 1: We will conduct red teaming against existing frameworks like LangChain and CrewAI to highlight security gaps and then build portable defense mechanisms. We will develop the proposed continuation that the Google Deep Mind research team laid out and combine CaMeL with Progent into an open source, defensive library that demonstrably blocks our suite of attacks (e.g. indirect prompt injection leading to shell execution). Our primary goal is to democratize defense leveraging existing research and contributing new defensive research.

Stream 2: We will prototype AI Control mechanisms combining classical control flow theory, graph analysis and reinforcement learning methods to check for security posture regressions resulting from commit delta modifications to a system by an agent to address a few threat models outlined in Redwood Research’s blog that impacts critical production systems. Our goal for Stream 2 is to build systems that remain secure despite the agent itself being untrustworthy. Treat all code (AI or human-authored) as untrusted until proven secure, build mechanisms that continuously detect and block posture regressions at commit time.

Timeline:

  • January: Experiment design. 

    • Stream 1: We will focus on rapid prototyping attacks within open source frameworks. 

    • Stream 2: Dataset curation and shortlisting approaches

  • February - March: Execution

    • Stream 1: Red team and Blue team phase. We will execute attacks and implement proposed mitigations.

    • Stream 2: Proof-of-Concept v1 to gauge effectiveness of different approaches to catch security posture changes.

  • April: Evaluation and Research Outputs Synthesis

    • Stream 1: We will package tools for GitHub release, 

    • Stream 2: Write up findings and implementation steps

Share results at the AI Safety Unconference.

  • Mid-May: Finalize and submit our workshop proposal to DEFCON.

  • Mid-May - July (Follow-on): Deepen research into Stream 2 (Infrastructure as Code security regressions) and retain high performing team members to convert prototypes into robust open source libraries.

How will this funding be used?

We have recruited 8 participants via the AI Safety Camp application pool. To ensure this larger team can execute high quality research and development, we are looking for grants to support the team’s development activities such as tooling and compute  that our current resources do not cover.

With Target Funding ($4000): We will equip our 10 person team with supporting infrastructure to maximize execution velocity and output.

  • $1,600 (Engineering Velocity): 4 months of AI development IDEs for 10 users to maximize coding velocity and to elicit code that undermines system safety and security

  • $2,000 (API tokens and compute): OpenRouter Credits or equivalent compute for the entire team for training/inference which will be distributed based on tasks.

  • $400 (Other tooling): Subscriptions to other software and project management tooling like Langsmith, Notion

A minimum $1000 funding in addition to our existing funding will be applied to the above categories per the leads' discretion and $4000 is our desired target

Who is on your team? What's your track record on similar projects?

  • Evan Harris (Stream 1 Lead): Software engineer and security researcher with a focus on MCP servers. SPAR 2025 Fellow, track record of multiple vulnerability disclosures and bug bounties in security. Professional software developer since 2018.

  • Preeti Ravindra (Stream 2 Lead): AI Security Research Lead with 9 years of experience working at the intersection of AI and Security in industry research labs. Holds multiple publications and patents in AI Security, thought leader and invited speaker at leading security conferences. Serves on the review board of academic and industry conferences like CAMLIS and [un]promptedcon

Both professionals have completed a previous iteration of AI safety camp and have successfully published papers.

What are the most likely causes and outcomes if this project fails?

  1. Ineffective Mitigations: Our defenses might be too heavy handed and may sacrifice agent usability.

    • Outcome: Tools are ignored by the community.

    • Mitigation: We will track usability scores alongside security metrics to ensure practicality.

  2. Failure to Reproduce Certain Classes of Threats: Most of these threat models are already rampant and some are emerging minimizing risk. However, there are some research components to this project where the chosen agentic systems may be too unsophisticated to execute complex attacks or unexpectedly robust for some classes of threats.

    • Outcome: Scope reduction on threat classes/models

    • Mitigation: We will timebox vulnerability discovery and threat elicitation to 3 weeks and pivot to different models  if initially tested models are robust.

How much money have you raised in the last 12 months, and from where?

Project inception began in Oct 2025 and we're looking at Manifund to be one of the key sources of funding. We have secured initial support which provides a starting base, but is insufficient for the compute and tooling costs of a 10 person team:

  • $1,000 in general support grant from AI Safety Camp.

  • $1000 as compute credits split across different compute providers

CommentsOffersSimilar8
remmelt avatar

Remmelt Ellen

11th edition of AI Safety Camp

Cost-efficiently support new careers and new organisations in AI Safety.

Technical AI safetyAI governance
25
31
$45.1K raised
Apart avatar

Apart Research

Apart Research: Research and Talent Acceleration

Support the growth of an international AI safety research and talent program

Science & technologyTechnical AI safetyAI governanceEA communityGlobal catastrophic risks
6
1
$0 raised
Eko avatar

Marisa Nguyen Olson

Building an AI Accountability Hub

Case Study: Defending OpenAI's Nonprofit Mission

Technical AI safetyAI governanceBiosecurityGlobal catastrophic risks
1
2
$0 raised
Apart avatar

Apart Research

Help Apart Expand Global AI Safety Research

Incubate AI safety research and develop the next generation of global AI safety talent via research sprints and research fellowships

Science & technologyTechnical AI safetyAI governanceEA communityGlobal catastrophic risks
8
12
$17.7K raised
AlexandraBos avatar

Alexandra Bos

AI Safety Research Organization Incubator - Pilot Program

3
7
$16K raised
Apart avatar

Apart Research

Keep Apart Research Going: Global AI Safety Research & Talent Pipeline

Funding ends June 2025: Urgent support for proven AI safety pipeline converting technical talent from 26+ countries into published contributors

Technical AI safetyAI governanceEA community
33
39
$131K raised
Adam-Shai avatar

Adam Shai

Simplex - building our research team

Fund a new research agenda, based on computational mechanics, bridging mechanism and behavior to develop a rigorous science of AI systems and capabilities.

Science & technologyTechnical AI safety
6
2
$0 raised
Peter-Vamplew avatar

Peter Vamplew

Mitigating Reward Misspecification in Reinforcement Learning

Mitigating Reward Misspecification in Reinforcement Learning Using Multiple Independent Reward Specifications and Multi-objective Reinforcement Learning

1
0
$0 raised