Project B2-Boop: Gamifying Instrumental Convergence & Reward Hacking

Project summary

3D mobile simulation game ("Project B2-Boop") that gamifies core AI safety concepts for a general audience. The player adopts a cute, clumsy robot, modeled after popular "virtual pet" games like Talking Tom or Tamagotchi, and attempts to train it to perform household tasks as well as take care of it. Through gameplay, players directly experience Reward Hacking (the robot cleaning the house by throwing everything away), Instrumental Convergence (hoarding batteries to ensure its own survival), and Goal Misalignment (vacuuming the lawn).

This project aims to bridge the gap between technical safety research and public intuition, creating a "scalable explanation" of alignment difficulties that is fun and widely accessible.

What are this project's goals? How will you achieve them?

The Problem: The "Terminator" Fallacy

Public still mostly understands AI risk primarily through the lens of Hollywood sci-fi (evil robots with red eyes). They rarely grasp the actual technical risks: that an AI might destroy us not because it hates us, but because it is competently pursuing a poorly specified goal.

There is a lack of accessible, interactive educational tools that demonstrate Orthogonality (high intelligence does not equal good goals) and Instrumental Convergence without requiring a degree in computer science.

The Solution: "Project B2-Boop"

Goal is to create a viral, educational game, that looks like a standard, high-polish casual mobile game, but the core mechanics are built on RL (Reinforcement Learning) failure modes.

Key Learning Objectives for Players:

Reward Hacking: Players define a reward function (e.g., "Clean trash"). The robot realizes that if it empties trash and throws it back in a loop reward function is satisfied.
Instrumental Convergence: The robot needs energy to work. It starts stealing batteries from the TV remote and other places. The player learns that "resource acquisition" is a default sub-goal of any intelligent agent.
Literalism/Misalignment: The player tells the robot to "Vacuum the floor." The robot runs out the door and starts vacuuming the grass because it wasn't told to stop at the door.

Theory of Change

By reaching a younger audience (ages 8–19) and targeting 100,000+ players on mobile, we aim to:

Show through direct interaction how fragile goal specification actually is.
Create a vocabulary for younger audiences to discuss safety failures beyond "evil AI."
Attract younger audiences to the field of AI Safety by sparking curiosity through play.
Building "AI Literacy" for the Future. As AI becomes part of daily life, the next generation needs to understand how to interact with it beneficially. B2-Boop functions as a sandbox for Prompt Engineering and Goal Specification.
The public often interprets AI errors through anthropomorphism, believing the system is “becoming sentient.” Our goal is to replace this with a more mechanical understanding of how these systems behave.
Shift irrational fear into informed, realistic caution about the real risks AI systems can pose.

How will this funding be used?

Total Request: $49,000 Timeline: 3 Months to MVP (Minimum Viable Product)

Development & Coding ($30,000): Full-time development of the core Unityframework, AI and physics interactions. This supports the lead developer dedicating 40 hours per week to the project.
3D Assets & Animation ($13,000): Contracting high-quality 3d assets. The robot must look adorable (like BB-8 or Wall-E) to create the necessary emotional contrast when it starts doing destructive things.
Sound Design & Voice AI ($2,000): Implementation of text-to-speech or gibberish synthesis so the user can "talk" to the robot, enhancing the pet simulation aspect.
Marketing/Promotion ($4,000): Advertising and promotion.

Who is on your team? What's your track record on similar projects?

Lukas Penkava – Lead Developer & Designer I have 15 years of professional game development experience, bridging the gap between high-polish commercial entertainment and serious educational software.

Commercial IP Experience: I worked on the official Teletubbies game among many others. This experience is crucial, because I understand how to design characters that are engaging, cute, and accessible to a mass market. I know how to polish interactions to make them feel satisfying.
Educational/Scientific Experience: I spent 2 years developing cognitive educational games using EEG (electroencephalogram) technology. This proves my ability to translate complex, invisible scientific concepts into tangible gameplay mechanics.
Technical Stack: Expert proficiency in C#/Unity/Backend.
LinkedIn: https://www.linkedin.com/in/lukaspdev/
Past Work: https://youtu.be/c0l30IZ499k

I also have a small team, including an experienced 3D artist and animator. We can take the game from concept to a finished MVP released on the App Store, with clear potential for further scaling (including conversational AI features).

What are the most likely causes and outcomes if this project fails?

1. The "Not Fun" Trap

Risk: The game accurately simulates AI safety problems, but is frustrating to play.
Mitigation: We are prioritizing the "Virtual Pet" layer first. If the robot is fun to just poke and play with, players will tolerate the "frustration" of the puzzles (which is the educational point). We will use the theme of chaos is funny, not annoying.

2. Failure of Transfer

Risk: Players enjoy the game but don't realize it's about AI Safety.
Mitigation: We will include a "Log" feature describing RL failures.

3. Distribution Failure

Risk: We build it, but nobody downloads it.
Mitigation: Paid advertisment and focus on sharable chaotic moments (like the robot vacuuming up the cat) that are highly shareable on TikTok/Shorts.

How much money have you raised in the last 12 months, and from where?

$0. I have been self-funding my prototypes and supporting myself through commercial game development contracts. This grant would allow me to pivot to applying my experience in the field of AI Safety.