Beyond Consensus

  • Authors: Suryaansh Jain, Umair Z. Ahmed, Shubham Sahai, and Ben Leong
  • Published: 2025
  • Source: arXiv preprint (arXiv:2510.11822)
  • Document Link: https://arxiv.org/abs/2510.11822

This paper identifies a critical “agreeableness bias” where LLM judges excel at identifying valid outputs but fail significantly at spotting invalid ones. To solve this, the authors propose a “minority-veto” strategy and a regression-based framework to ensure more accurate and reliable AI evaluations.

The Problem (Agreeableness Bias): While LLM-as-a-judge is scalable, it suffers from a strong positive bias. LLMs show high True Positive Rates (>96%) but very low True Negative Rates (<25%), meaning they often mistakenly agree with incorrect or invalid answers.

The Flaw of Majority Voting: Traditional ensemble methods like majority voting fail to fix this because if most models share the same bias, the group decision remains incorrect.

Proposed Solution 1 (Minority-Veto): The researchers introduce an “optimal minority-veto strategy.” If even a single reliable model (or a small minority) flags an output as invalid, that “veto” is prioritized to counteract the general tendency to agree.

Proposed Solution 2 (Regression Framework): For higher precision, a novel regression-based model is used to estimate and correct the specific bias of validators using a small amount of ground-truth data.

Key Impact: These strategies significantly improve the reliability of automated evaluations, making it safer and more cost-effective for developers to test and switch between new Large Language Models.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *