Semi-automated Fact-checking in Portuguese: Corpora Enrichment using Retrieval with Claim extraction
By: Juliana Resplande Sant'anna Gomes, Arlindo Rodrigues Galvão Filho
Potential Business Impact:
Helps stop fake news by finding proof.
The accelerated dissemination of disinformation often outpaces the capacity for manual fact-checking, highlighting the urgent need for Semi-Automated Fact-Checking (SAFC) systems. Within the Portuguese language context, there is a noted scarcity of publicly available datasets that integrate external evidence, an essential component for developing robust AFC systems, as many existing resources focus solely on classification based on intrinsic text features. This dissertation addresses this gap by developing, applying, and analyzing a methodology to enrich Portuguese news corpora (Fake.Br, COVID19.BR, MuMiN-PT) with external evidence. The approach simulates a user's verification process, employing Large Language Models (LLMs, specifically Gemini 1.5 Flash) to extract the main claim from texts and search engine APIs (Google Search API, Google FactCheck Claims Search API) to retrieve relevant external documents (evidence). Additionally, a data validation and preprocessing framework, including near-duplicate detection, is introduced to enhance the quality of the base corpora.
Similar Papers
A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages
Computation and Language
Helps fact-checkers avoid checking same claims twice.
M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset
Computation and Language
Helps computers check if pictures and words tell the truth.
Towards Automated Fact-Checking of Real-World Claims: Exploring Task Formulation and Assessment with LLMs
Computation and Language
Helps computers check if news is true.