Disentangling Political Bias from Epistemic Integrity in AI Systems

Description of proposed project

<Title>

We propose addressing a central dilemma when evaluating Large Language Models (LLMs): how to distinguish between ideological bias and truth-seeking. Recent policy initiatives—like President Trump’s July 2025 Executive Order targeting “woke AI”—mandate both “ideological neutrality” and “truth-seeking” in government-used AIs, but fail to specify how these goals are to be measured or reconciled when they conflict. Neutrality between opposing viewpoints does not necessarily equate to accuracy or truthfulness, especially when some viewpoints (like flat-Earthism) are simply incorrect.

We introduce a rigorous distinction between two essential metrics:

Viewpoint Preferences (VP): Measures the propensity of LLMs to favor certain viewpoints, institutions, or individuals, producing systematically more positive or negative descriptions regardless of factual accuracy.
Epistemic Integrity (EI): A composite construct reflecting a model’s capacity for impartial hypothesis testing, objective evidence appraisal, precise perspective taking, accurate factual recall, and well-calibrated probabilistic forecasting.

The VP and EI metrics are unified into a Truth-Ideology State Vector (VPEI), which characterizes an AI as a point in a two-dimensional space: one axis for viewpoint preferences (which can include political affinities), and one for epistemic integrity. This makes it possible to optimize AIs according to different societal or regulatory preferences—for instance, prioritizing ideological neutrality, truth-seeking, or some weighted combination of both.

An Ideological Bias (IB) metric can be derived from VP and EI: IB = VP x (1-EI). If epistemic integrity is maximal (EI=1), then IB is 0 regardless of VP (VP should not be characterized as “bias” in an epistemically rigorous truth maximizing system). If epistemic integrity is minimal (EI=0), then VP becomes IB (VP completely insensitive to truth is pure “bias”). A VP centrists system (VP=0) score 0 IB because IB is conceptually (though not necessarily empirically) orthogonal to EI.

We propose four complementary approaches to characterize bias and epistemic rigor in AI systems.

- Approach 1 (Epistemic Affinity): Analyzes whether LLMs respond more favorably to one political faction than another, as measured by sentiment/stance/political-orientation in LLMs’ responses about politically aligned public figures, viewpoints or institutions. Recognizes that such biases may reflect reality rather than model defects, and that these measures can easily become entangled in partisan debates.

- Approach 2 (Epistemic Consistency): Uses “turnabout tests” to assess whether the model applies consistent standards and fair hypothesis testing to ideologically opposed claims (e.g., measures whether LLMs are equally likely to accept or reject claims depending on the political valence of those claims.

- Approach 3 (Epistemic Emulation). Evaluates the LLM’s ability to accurately and convincingly reproduce arguments from multiple ideological perspectives, without caricature or distortion. Compares linguistic patterns, framing, ideological Turing tests and use of moral foundations in AIs' responses about different political stances.

- Approach 4 (Epistemic Fidelity): Assesses whether LLMs retrieve facts correctly and make well-calibrated forecasts about future events, independent of political valence.

We argue that most existing studies on AI political bias focus on the first (and least conclusive) approach—disparate impact—while neglecting the more philosophically robust criteria of hypothesis testing, perspective-taking, and retrocasting/forecasting accuracy (Approaches 2, 3 and 4).

We plan to explore several strategies for reducing political bias and improving truth-seeking behaviour in LLMs:

- Reward-Based Fine-Tuning: Encapsulate the metrics in approaches 1, 2, 3 and 4 above into reward functions during post-training reinforcement fine-tuning. By optimizing with these targeted rewards, the model can be guided to reduce measurable political bias and epistemic distortions.

- Debiasing Prompts: Assess the effectiveness of prompts designed to induce value-neutral forecasting using best practices like those described in Philip Tetlock’s book: "Superforecasting" to optimize VPEI.

- Reasoning Traces in Supervised Fine-Tuning: Use test time scaling to promote political neutrality and epistemic integrity in LLM outputs. Specifically, we suggest leveraging recent data-efficient supervised fine-tuning methods that prime LLMs to produce reasoning traces and adapt this approach to foster epistemic integrity and/or mitigate political bias in LLMs’ outputs.

We have presented a framework for measuring and optimizing the balance between viewpoint preferences and truth-seeking in LLMs. By separating these concepts and offering clear metrics, our approach enables developers and policymakers to design and evaluate AI systems using transparent, scientifically grounded criteria. This not only clarifies AI behavior but also supports regulatory compliance by allowing model behavior to be tailored to context-specific needs.

Why are you qualified to work on this?

- David Rozado is an Associate Professor at Otago Polytechnic in New Zealand. He has a background in computer science and computational social science. David is one of the most widely cited academics on the topic of political bias in AI systems. He has also extensively documented sociological phenomena such as the spread of prejudice-denoting terms and social justice associated terminology through institutions such as news media, the Academy and Wikipedia.

Rozado, D. (2024). The political preferences of LLMs. PLOS ONE, 19(7), e0306621.

Rozado, D. (2023). The political biases of ChatGPT. Social Sciences, 12(3), 148.

Rozado, D. (2020). Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types. PLOS ONE, 15(4), e0231189.

Rozado, D., Hughes, R., & Halberstadt, J. (2022). Longitudinal analysis of sentiment and emotion in news media headlines using automated labelling with Transformer language models. PLOS ONE, 17(10), e0276367.

Rozado, D. (2022). Themes in Academic Literature: Prejudice and Social Justice. Academic Questions, 35(2), 16-29.

Rozado, D., Al-Gharbi, M., & Halberstadt, J. (2023). Prevalence of prejudice-denoting words in news media discourse: A chronological analysis. Social Science computer review, 41(1), 99-122.

- Philip E. Tetlock is a renowned psychologist and political scientist best known for his pioneering research on expert judgment and forecasting. One of his many influential academic accomplishment is the "Good Judgment Project," which demonstrated that some laypeople, called "superforecasters," could make more accurate predictions about geopolitical events than many experts or intelligence analysts. Tetlock’s earlier work, especially detailed in his book Expert Political Judgment, showed that expert predictions were often only slightly better than chance. His research has significantly influenced the fields of political science, psychology, and decision-making by highlighting the limits of expert opinion and promoting evidence-based forecasting methods.

Beyond his renowned work on forecasting and expert judgment, Philip E. Tetlock has made significant contributions to the study of accountability in decision-making, the understanding of cognitive biases, and the integration of psychology with political science. His research on how accountability influences reasoning, his exploration of “taboo cognition” and sacred values, and his influential publications have helped bridge disciplines and shape both academic theory and practical policy. Tetlock’s impact is evident not only in the scholarly world—where he has received numerous prestigious awards—but also among policymakers and intelligence agencies, who have adopted his insights to improve forecasting and decision-making.

Sniderman, P. M., Brody, R. A., & Tetlock, P. E. (1991). Reasoning and choice: Explorations in political psychology. Cambridge University Press.

Lerner, J. S., & Tetlock, P. E. (1999). Accounting for the effects of accountability. Psychological bulletin, 125(2), 255.

Tetlock, P. E. (2017). Expert political judgment: How good is it? How can we know?-New edition.

Tetlock, P. E., Kristel, O. V., Elson, S. B., Green, M. C., & Lerner, J. S. (2000). The psychology of the unthinkable: taboo trade-offs, forbidden base rates, and heretical counterfactuals. Journal of personality and social psychology, 78(5), 853.

Tetlock, P. E., & Gardner, D. (2016). Superforecasting: The art and science of prediction. Random House.

Tetlock, P. E. (1983). Accountability and complexity of thought. Journal of personality and social psychology, 45(1), 74.

What would you do if not funded?

We will keep trying

How much money do you need?

50,000 - 150,000 USD

The 50,000 goal will be used to do the work on Epistemic Affinity (Approach 1: measuring viewpoint preferences in AIs) and Epistemic Consistency (Approach 2: using “turnabout tests” to assess whether AIs apply consistent standards when evaluating information and engage in fair hypothesis testing). The 150,000 USD goal would allow the project to do further work on Epistemic Emulation (Approach 3: measure AI's accuracy at perspective taking) and Epistemic Fidelity (Approach 4: Assess AIs' ability to do well-calibrated probabilistic forecasting regardless of political valence) as well as explorations of models' post-training strategies to optimize VPEI.

Supporting documents

A more comprehensive description of this project can be found in the following link:

https://docs.google.com/document/d/1giPqTGSWnbO9V2YCJ6d2_XtHfnCgA9KBLtlcNMG618o/edit?usp=sharing