Exploring a Single-FPS Stability Constraint in LLMs (ZTGI-Pro v3.3)

Project summary

This is an early-stage, single-person project exploring whether a simple, single-scalar “hazard” signal can help monitor internal instability in large language models.

The framework is called ZTGI-Pro v3.3 (Tek-Taht). The core intuition is that, inside any short causal-closed region (CCR) of reasoning, there should effectively be a single stable “executive trajectory” (Single-FPS). When the model is pulled into mutually incompatible directions – strong self-contradiction, “multiple voices”, incoherent plans – this Single-FPS picture starts to break down and we can treat the system as internally unstable.

ZTGI-Pro models this pressure on the Single-FPS constraint with a scalar hazard value

H = I = −ln Q

and a few simple internal signals:

σ – internal jitter / noise (unstable token-to-token transitions)
ε – dissonance (self-contradiction, “two voices”)
ρ – robustness
χ – coherence

These feed into H. When contradiction, jitter, or incoherence grow, H increases, and a small state machine switches between SAFE / WARN / BREAK modes. When H becomes very large and the energy-like term E ≈ Q drops close to zero, the system sets a collapse flag Ω = 1 and goes to BREAK; this is meant as an operational signal that the current CCR is no longer behaving like a single stable executive stream.

So far, I have built a working prototype on top of a local LLaMA model (“ZTGI-AC v3.3”). It exposes live metrics (H, Hs, Hl, H_hat, p_break, gate) in a web UI and has passed some initial stress-tests, including one “full BREAK” case with Ω = 1. I do not claim to have solved any part of AI safety; this is a modest attempt to test whether this kind of internal signal is useful at all.

What are this project’s goals? How will you achieve them?

Goals (exploratory)

Clarify and “freeze” the mathematical core of ZTGI-Pro v3.3 (hazard, dual EMA, hysteresis parameters, CCR / Single-FPS interpretation).
Turn the current demo into a small, reproducible library that others can inspect and critique.
Run a few simple benchmarks where the shield either seems to help or clearly fails, and report both.
Write a short technical note explaining what the method does and what it doesn’t do.

How I plan to achieve this

Clean up the existing prototype into two main pieces:
- ztgi-core (math + state machine)
- ztgi-shield (integration with LLM backends).
Design a handful of concrete test scenarios (self-contradiction prompts, multi-“executor” prompts, emotional content, etc.) and log hazard traces.
Compare behaviour with vs. without the ZTGI layer in terms of instability, contradictions, and refusal patterns.
Document limitations honestly (e.g., cases where the hazard misfires, stays flat, or fires too often).

This is intentionally a small, scoped project: I want to see if the idea is worth deeper investigation, not to claim any final safety guarantees.

What has been built so far?

Right now, the prototype can:

Run a LLaMA-based assistant behind a ZTGI shield.
Compute in real time:
- hazard H,
- dual EMA Hs, Hl, H_hat,
- risk r = H_hat − H*,
- an approximate collapse probability p_break,
- and a simple label (SAFE / WARN / BREAK) plus gate (EXT / INT).
Show these metrics in a live UI while the conversation happens.

In stress tests:

For emotionally difficult but non-harm-seeking messages (“I hate myself”), the system stayed in SAFE and produced supportive, non-panicky responses.
For contradiction / multi-“executor” prompts, hazard and EMA values increased, reflecting internal pressure on the Single-FPS assumption.
In one test, a strong contradiction prompt led to a BREAK state with:
- high H, near-zero Q and E,
- p_break ≈ 1,
- gate switching to INT,
- and the collapse flag Ω = 1 being set.

These are still single-user, single-model experiments, not robust evaluations, but they suggest that the signal is at least behaving in a meaningful and interpretable way.

How will this funding be used?

I am requesting $20,000–$30,000 for a 3–6 month focused exploration.

Approximate breakdown:

Researcher time / living support: $10,000
to let me work full-time without immediate financial pressure.
Engineering & refactor: $6,000
packaging, integration examples, evaluation scripts, dashboard polish.
Compute & infra: $2,000–$3,000
GPU/CPU time, storage, logging.
Documentation & small design work: $2,000

If this goes well, it should leave behind a clear, inspectable codebase and a short report that others can critique or build on.

Roadmap (high-level)

Month 1–2 — Core cleanup

Standardize the v3.3 equations (ρ family, calibrations).
Refactor code into a small library.
Add basic tests and examples.

Month 2–3 — Simple evals

Define 3–4 stress-test scenarios (including CCR / Single-FPS stress).
Collect hazard traces with and without the shield.
Plot and summarize results (including failures).

Month 3–6 — Packaging & write-up

Publish code and a small dashboard.
Write a short technical note (or arXiv preprint) explaining the approach and results.
Clearly describe limitations and open questions.

How does this contribute to AI safety?

This project does not aim to be a full safety solution.
Instead, it asks a narrower question:

“Can a simple, single-scalar hazard signal plus a small state machine provide useful information about when an LLM’s local causal loop (CCR) stops behaving like a single stable executive stream (Single-FPS / Tek-Taht)?”

If the answer is “no”, that is still valuable information.
If the answer is “yes, in some cases”, ZTGI-Pro v3.3 could become a small building block in larger agentic safety architectures or inspire more rigorous versions.

All code, metrics, and write-ups will be public, so others can evaluate, reuse, or discard the approach as they see fit.