Score: 1

Efficient Greedy Discrete Subtrajectory Clustering

Published: March 18, 2025 | arXiv ID: 2503.14115v1

By: Ivor van der Hoog , Lara Ost , Eva Rotenberg and more

Potential Business Impact:

Groups similar paths together to find patterns.

Business Areas:

A/B Testing Data and Analytics

We cluster a set of trajectories T using subtrajectories of T. Clustering quality may be measured by the number of clusters, the number of vertices of T that are absent from the clustering, and by the Fr\'{e}chet distance between subtrajectories in a cluster. A $\Delta$-cluster of T is a cluster ${\mathcal{P}}$ of subtrajectories of T with a centre $P \in {\mathcal{P}}$ with complexity $\ell$, where all subtrajectories in ${\mathcal{P}}$ have Fr\'{e}chet distance at most $\Delta$ to $P$. Buchin, Buchin, Gudmundsson, L\"{o}ffler and Luo present two $O(n^2 + n m \ell)$-time algorithms: SC($\max$, $\ell$, $\Delta$, T) computes a single $\Delta$-cluster where $P$ has at least $\ell$ vertices and maximises the cardinality $m$ of ${\mathcal{P}}$. SC($m$, $\max$, $\Delta$, T) computes a single $\Delta$-cluster where ${\mathcal{P}}$ has cardinality $m$ and maximises the complexity $\ell$ of $P$. We use such maximum-cardinality clusters in a greedy clustering algorithm. We provide an efficient implementation of SC($\max$, $\ell$, $\Delta$, T) and SC($m$, $\max$, $\Delta$, T) that significantly outperforms previous implementations. We use these functions as a subroutine in a greedy clustering algorithm, which performs well when compared to existing subtrajectory clustering algorithms on real-world data. Finally, we observe that, for fixed $\Delta$ and T, these two functions always output a point on the Pareto front of some bivariate function $\theta(\ell, m)$. We design a new algorithm PSC($\Delta$, T) that in $O( n^2 \log^4 n)$ time computes a $2$-approximation of this Pareto front. This yields a broader set of candidate clusters, with comparable quality. We show that using PSC($\Delta$, T) as a subroutine improves the clustering quality and performance even further.