Real-Time Detection of Hallucinations in Large Language Models

Project summary

I have built a working system that can detect hallucinations in large language models in real time, as they generate responses. In a strict out-of-sample benchmark against standard confidence-based approaches, our method improves hallucination detection from ~64% to ~74%, with statistically decisive gains — even using a small, lightly trained model without extensive optimization. Early experiments show that hallucinated answers produce consistent, measurable internal differences from grounded answers. This project will rigorously validate and scale that detection approach across larger models and broader prompt sets to determine how reliably it improves AI system trustworthiness.

What are this project's goals? How will you achieve them?

Goals:

Validate preliminary findings across multiple larger open-weight language models.
Quantify statistical separability between grounded and fabricated outputs using internal reliability signals.
Establish clear evaluation benchmarks and performance metrics (e.g., classification performance, calibration quality) for real-time hallucination detection.
Produce a rigorous empirical evaluation of real-time reliability signals in LLM inference.

Approach:

Analyze internal inference-time signals produced by standard forward passes.
Construct structured prompt sets spanning grounded, fabricated, and epistemically ambiguous cases.
Measure internal reliability patterns and test their correlation with hallucination outcomes.
Run controlled statistical experiments to evaluate detection performance and robustness.
Publish a technical report describing evaluation results, and empirical findings, while retaining proprietary implementation details.

How will this funding be used?

Funding would support:

3–4 months of full-time research time.
Compute for larger-scale inference experiments.
Dataset construction and evaluation tooling.
Modest infrastructure costs.

Primary allocation: researcher stipend + compute.

Who is on your team? What's your track record on similar projects?

I am an independent AI researcher with a Ph.D. in Computer Science (University of Surrey) focused on computation and information processing in network cascades . My research bridges statistical mechanics, complex systems, and modern neural architectures.

Over the past 18 months, I have designed and implemented a novel transformer-based system for modeling uncertainty in large language models under an NSF ACCESS supercomputing allocation . This included building multi-stage training pipelines, custom loss formulations, calibration diagnostics, and large-scale evaluation tooling.

I have a track record of publishing foundational work on computation in networks (including Scientific Reports and Network Science) and have previously led multi-month independent research programs combining theory, simulation, and empirical validation.

For this project, I am the sole investigator, responsible for model instrumentation, experimental design, evaluation, and analysis.

What are the most likely causes and outcomes if this project fails?

The primary risk is that the uncertainty signals observed in preliminary experiments do not generalize across model families, scales, or prompt distributions. It is also possible that separability between grounded and fabricated outputs weakens under broader or adversarial evaluation.

However, preliminary results are strongly positive: across 7,200 structured samples, we observed consistent structural divergence between grounded and fabricated regimes.

Even if detection performance proves weaker than expected under broader evaluation, the project would still produce meaningful outputs:

Empirical insight into internal layer dynamics during inference.
Structured signals that differentiate types of hallucination or instability.
Clear evidence about which inference-time indicators are insufficient or non-robust.
A more precise map of where reliability signals degrade.

In other words, failure would still yield mechanistic information about model behavior, not just a negative result on a metric.

How much money have you raised in the last 12 months, and from where?

I have not raised external funding in the last 12 months. The project to date has been self-funded.