Score: 1

SPER: Accelerating Progressive Entity Resolution via Stochastic Bipartite Maximization

Published: December 29, 2025 | arXiv ID: 2512.23491v1

By: Dimitrios Karapiperis , George Papadakis , Themis Palpanas and more

Potential Business Impact:

Finds matching info faster in huge data.

Business Areas:

Enterprise Resource Planning (ERP) Software

Entity Resolution (ER) is a critical data cleaning task for identifying records that refer to the same real-world entity. In the era of Big Data, traditional batch ER is often infeasible due to volume and velocity constraints, necessitating Progressive ER methods that maximize recall within a limited computational budget. However, existing progressive approaches fail to scale to high-velocity streams because they rely on deterministic sorting to prioritize candidate pairs, a process that incurs prohibitive super-linear complexity and heavy initialization costs. To address this scalability wall, we introduce SPER (Stochastic Progressive ER), a novel framework that redefines prioritization as a sampling problem rather than a ranking problem. By replacing global sorting with a continuous stochastic bipartite maximization strategy, SPER acts as a probabilistic high-pass filter that selects high-utility pairs in strictly linear time. Extensive experiments on eight real-world datasets demonstrate that SPER achieves significant speedups (3x to 6x) over state-of-the-art baselines while maintaining comparable recall and precision.