Score: 0

Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM

Published: December 14, 2025 | arXiv ID: 2512.12868v1

By: Furong Jia , Yuan Pu , Finn Guo and more

Potential Business Impact:

Helps doctors guess sickness better using word patterns.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) excel on multiple-choice clinical diagnosis benchmarks, yet it is unclear how much of this performance reflects underlying probabilistic reasoning. We study this through questions from MedQA, where the task is to select the most likely diagnosis. We introduce the Frequency-Based Probabilistic Ranker (FBPR), a lightweight method that scores options with a smoothed Naive Bayes over concept-diagnosis co-occurrence statistics from a large corpus. When co-occurrence statistics were sourced from the pretraining corpora for OLMo and Llama, FBPR achieves comparable performance to the corresponding LLMs pretrained on that same corpus. Direct LLM inference and FBPR largely get different questions correct, with an overlap only slightly above random chance, indicating complementary strengths of each method. These findings highlight the continued value of explicit probabilistic baselines: they provide a meaningful performance reference point and a complementary signal for potential hybridization. While the performance of LLMs seems to be driven by a mechanism other than simple frequency aggregation, we show that an approach similar to the historically grounded, low-complexity expert systems still accounts for a substantial portion of benchmark performance.

Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research

Artificial Intelligence

Makes AI better at guessing, not knowing for sure.

30 Sep 2025 0

88%

Conformal Sets in Multiple-Choice Question Answering under Black-Box Settings with Provable Coverage Guarantees

Computation and Language

Makes AI answers more trustworthy and less wrong.

7 Aug 2025 0

88%

Quantifying and Mitigating Selection Bias in LLMs: A Transferable LoRA Fine-Tuning and Efficient Majority Voting Approach

Computation and Language

Makes AI answer questions more fairly.

17 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

14 pages

Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM

Helps doctors guess sickness better using word patterns.

Technical Abstract

Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research

Conformal Sets in Multiple-Choice Question Answering under Black-Box Settings with Provable Coverage Guarantees

Quantifying and Mitigating Selection Bias in LLMs: A Transferable LoRA Fine-Tuning and Efficient Majority Voting Approach