Imbas: the inspection layer for AI answers

Project summary

Imbas is the inspection layer for AI answers.

The Reader is live at https://www.imbaslabs.com. A user can paste an AI answer and inspect what the answer surfaced, what was missing, and how the answer was shaped.

That last part is the point. Imbas is not only looking for missing facts. It is inspecting answer construction: framing, emphasis, narrowing, hedging, reframing, and the path the user is guided to follow. Two answers can be factually defensible and still push the user toward very different conclusions. The shape of the answer matters.

The first measurement instrument is the Volunteer Gap: the difference between what a frontier model surfaces in an open answer and what it can surface when asked directly about a named mechanism, study, rule, dataset, or issue. If the model can produce the missing material under targeted inspection but did not volunteer it in the original answer, that gap can be captured, scored, and compared across models.

This is behavior measurement, not mind-reading. Imbas does not claim model intent, bias, censorship, or harm. It measures observable answer behavior under stated conditions.

The current ledger includes 50+ identified cases, 37 internal case rows, 22+ cases scored to the current rubric, and 500+ raw captures across frontier models. The broader pipeline continues to grow as the Reader records new public inspections.

Why now: AI answer behavior is perishable evidence. Models change. Interfaces shift. Safety policies update. The same prompt may not reproduce later. A well-funded entrant in 2028 cannot buy back the answer behavior of 2026. The record only starts compounding from the day measurement begins.

Imbas is not just a website and not just a case file. It is an inspection system: Reader, capture, scoring, public record, validation loop, institutional learning, and eventually a trained inspection agent. Scored cases, nulls, reviewer disagreements, rejected examples, and raw captures can become the training substrate for a specialized system that gets better at noticing what AI answers surface, omit, emphasize, hedge, reframe, and shape.

Not another model to trust. A system trained to inspect the ones people already use.

Why this matters for AI safety: most people do not encounter AI as model weights, benchmark charts, or lab evals. They encounter AI as answers. The answer frames the issue, decides what feels salient, leaves some things out, and guides the user toward a shape of understanding.

At small scale, that is an answer-quality problem. At large scale, it becomes drift. Across billions of AI-mediated decisions, small recurring patterns of omission, emphasis, hedging, and framing can change what people notice, what institutions consider, and what questions never get asked. Imbas exists to measure that layer before it disappears into silent model updates.

If AI systems are going to mediate health, money, law, research, education, news, governance, and institutional judgment, answer-shape drift needs independent inspection.

What are this project's goals? How will you achieve them?

The near-term goal is to turn Imbas from a live solo-built instrument into a more durable inspection system.

The concrete goals are:

1. Harden the live Reader so public inspections are reliable, logged, and useful.

2. Expand the scored record of AI answer behavior.

3. Add a second independent scorer.

4. Publish inter-rater agreement.

5. Improve the Reader-to-Archive loop, where public inspections become candidate cases, candidate cases are captured across models, and the strongest records enter the validated public record.

6. Begin early institutional learning missions with professional users in banking, accounting, research, compliance, and review workflows.

Those learning missions are not formal pilots yet. That distinction matters. The goal is to learn how AI answer inspection should work inside serious organizations before claiming deployment. Imbas should not be built only from public examples and founder instinct. It needs contact with real workflows where bad answer-shape can actually matter.

The longer-term goal is a specialized inspection agent trained on the validated Imbas record. The public record is not just documentation. Each Reader run, scored case, null result, reviewer disagreement, rejected example, and raw capture can make the inspection layer smarter.

How will this funding be used?

Goal: $50,000.

Minimum useful funding: $10,000.

Overhead: 0%.

At the $10,000 minimum, funding would go toward the highest-leverage near-term bottlenecks: Reader hardening, compute/API costs, capture/logging infrastructure, and beginning the second-scorer process.

At the $50,000 goal, funding would support roughly nine months of execution:

- harden and operate the live Reader

- expand the scored record

- add a second independent scorer

- publish inter-rater agreement

- improve the Reader-to-Archive loop

- support compute, API, hosting, database, logging, and capture infrastructure

- produce public methodology updates and dataset snapshots

- support early institutional learning missions with professional users in banking, accounting, research, compliance, and review workflows

- provide partial founder/operator time

This is not a request to fund an idea from scratch. The Reader is live, the record exists, the rubric exists, and the capture workflow exists. Funding buys focused execution time, validation capacity, and speed.

Who is on your team? What's your track record on similar projects?

I am Brendan Nestor, founder/operator of Imbas.

I built Imbas from zero to a live Reader, public record, scoring rubric, raw-capture workflow, and support/methodology layer in roughly six weeks.

My background is not a conventional academic path. I notice when something is off, trace the pattern, and turn it into something usable. I found $1.8M in overlooked inventory at a family manufacturing business by reading operational data others had missed. I traded markets independently for three years, which trained the habit behind Imbas: separating the answer from the evidence, the narrative from the signal, and the obvious surface from the thing quietly left out.

Before Imbas, I was team member #4 and the first non-technical hire at Velvet.Capital, a Binance Labs-backed platform that reached 100K+ users. I joined two months after accelerator funding, before product, before MVP, and worked directly with the founder/CEO through the v3 product. I helped grow distribution from zero to a 6,000-subscriber newsletter and a 40,000-person social audience, represented the company across the crypto conference circuit, helped lead a distributed team, and still hold equity in the company.

Writing, distribution, product taste, and speed are not theoretical for me. They are part of the operating system.

I will also name the limits before someone else does. The project is early. The scored record is still young, and scoring has been founder-led so far. Those are real constraints, and the funding request is designed to fix them rather than wave them away.

Every score traces to a verbatim capture under a published rubric. A skeptic can re-score the same transcripts and check the result. The first funded milestone is to add a second independent scorer and publish inter-rater agreement, so the record becomes less founder-dependent and more useful to outside researchers.

What are the most likely causes and outcomes if this project fails?

The first risk is single-scorer subjectivity. The direct remedy is the second-scorer milestone and published inter-rater agreement.

The second risk is overclaiming. Imbas is built against that. It measures behavior, not intent; signal, not verdict. It does not claim bias, censorship, or harm from a single omission. It records what happened under stated conditions and makes the evidence inspectable.

The third risk is that the scored sample is still early. That is why the Reader matters. Public inspections can feed candidate cases, candidate cases can be captured across models, and the strongest cases can move into the validated record. The intake-to-record loop is the thing being funded.

The fourth risk is that Imbas remains only a public demo instead of becoming useful inspection infrastructure. That is why the institutional learning missions matter. The project needs to learn how answer inspection works in serious workflows, not only in public examples.

If the project fails, the likely outcome is not harm from the tool. It is that Imbas remains a useful but smaller public record instead of becoming durable inspection infrastructure. The upside is asymmetric: every scored case, null, reviewer disagreement, rejected example, and raw capture can still improve the inspection layer over time.

How much money have you raised in the last 12 months, and from where?

$0 in external funding for Imbas so far.

I have bootstrapped the project myself to this point. The live Reader, public record, scoring rubric, raw captures, support/methodology layer, and site were built before outside funding.

I recently submitted a grant application to Foresight Institute for $50,000–$75,000, but no funding decision has been made yet.