Score: 0

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Published: January 7, 2026 | arXiv ID: 2601.04160v1

By: Yuechen Jiang , Zhiwei Liu , Yupeng Cao and more

Potential Business Impact:

Helps computers spot fake money news better.

Business Areas:

A/B Testing Data and Analytics

We introduce RFC Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference free misinformation detection and comparison based diagnosis using paired original perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC Bench provides a structured testbed for studying reference free reasoning and advancing more reliable financial misinformation detection in real world settings.

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Computation and Language

Helps computers spot fake money news better.

7 Jan 2026 1

88%

RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking

Computation and Language

Tests computers' ability to spot fake news.

14 Jun 2025 1

87%

CounterBench: A Benchmark for Counterfactuals Reasoning in Large Language Models

Computation and Language

Teaches computers to think "what if" better.

16 Feb 2025 1

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

39 pages

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Helps computers spot fake money news better.

Technical Abstract

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking

CounterBench: A Benchmark for Counterfactuals Reasoning in Large Language Models