Score: 1

A faster algorithm for efficient longest common substring calculation for non-parametric entropy estimation in sequential data

Published: October 15, 2025 | arXiv ID: 2510.13330v1

By: Bridget Smart, Max Ward, Matthew Roughan

Potential Business Impact:

Finds patterns faster in data streams.

Business Areas:

Text Analytics Data and Analytics, Software

Non-parametric entropy estimation on sequential data is a fundamental tool in signal processing, capturing information flow within or between processes to measure predictability, redundancy, or similarity. Methods based on longest common substrings (LCS) provide a non-parametric estimate of typical set size but are often inefficient, limiting use on real-world data. We introduce LCSFinder, a new algorithm that improves the worst-case performance of LCS calculations from cubic to log-linear time. Although built on standard algorithmic constructs - including sorted suffix arrays and persistent binary search trees - the details require care to provide the matches required for entropy estimation on dynamically growing sequences. We demonstrate that LCSFinder achieves dramatic speedups over existing implementations on real and simulated data, enabling entropy estimation at scales previously infeasible in practical signal processing.

An Adaptive CMSA for Solving the Longest Filled Common Subsequence Problem with an Application in Audio Querying

Sound

Finds patterns in DNA and music faster.

12 Sep 2025 1

85%

Space-Efficient and Output-Sensitive Algorithms for the Longest Common Bitonic Subsequence

Data Structures and Algorithms

Finds patterns that go up then down in data.

12 Nov 2025 0

84%

The Complexity of Maximal Common Subsequence Enumeration

Data Structures and Algorithms

Finds important patterns in data faster.

7 Apr 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

6 pages

A faster algorithm for efficient longest common substring calculation for non-parametric entropy estimation in sequential data

Finds patterns faster in data streams.

Technical Abstract

An Adaptive CMSA for Solving the Longest Filled Common Subsequence Problem with an Application in Audio Querying

Space-Efficient and Output-Sensitive Algorithms for the Longest Common Bitonic Subsequence

The Complexity of Maximal Common Subsequence Enumeration