Score: 0

Mitigating Hallucinations in Zero-Shot Scientific Summarisation: A Pilot Study

Published: November 30, 2025 | arXiv ID: 2512.00931v1

By: Imane Jaaouine, Ross D. King

Potential Business Impact:

Makes AI summaries of science papers more accurate.

Business Areas:

Semantic Search Internet Services

Large language models (LLMs) produce context inconsistency hallucinations, which are LLM generated outputs that are misaligned with the user prompt. This research project investigates whether prompt engineering (PE) methods can mitigate context inconsistency hallucinations in zero-shot LLM summarisation of scientific texts, where zero-shot indicates that the LLM relies purely on its pre-training data. Across eight yeast biotechnology research paper abstracts, six instruction-tuned LLMs were prompted with seven methods: a base- line prompt, two levels of increasing instruction complexity (PE-1 and PE-2), two levels of context repetition (CR-K1 and CR-K2), and two levels of random addition (RA-K1 and RA-K2). Context repetition involved the identification and repetition of K key sentences from the abstract, whereas random addition involved the repetition of K randomly selected sentences from the abstract, where K is 1 or 2. A total of 336 LLM-generated summaries were evaluated using six metrics: ROUGE-1, ROUGE-2, ROUGE-L, BERTScore, METEOR, and cosine similarity, which were used to compute the lexical and semantic alignment be- tween the summaries and the abstracts. Four hypotheses on the effects of prompt methods on summary alignment with the reference text were tested. Statistical analysis on 3744 collected datapoints was performed using bias-corrected and accelerated (BCa) bootstrap confidence intervals and Wilcoxon signed-rank tests with Bonferroni-Holm correction. The results demonstrated that CR and RA significantly improve the lexical alignment of LLM-generated summaries with the abstracts. These findings indicate that prompt engineering has the potential to impact hallucinations in zero-shot scientific summarisation tasks.

Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization

Computation and Language

Helps AI spot fake facts in mixed-up stories.

3 Mar 2025 1

89%

Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models

Computation and Language

Helps AI tell when it's making things up.

1 May 2025 0

89%

UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output

Computation and Language

Finds fake facts in AI answers.

5 May 2025 1

View PDF Login to Bookmark

Page Count

26 pages

Mitigating Hallucinations in Zero-Shot Scientific Summarisation: A Pilot Study

Makes AI summaries of science papers more accurate.

Technical Abstract

Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization

Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models

UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output