Large Scale Community-Aware Network Generation
By: Vikram Ramavarapu , João Alfredo Cardoso Lamy , Mohammad Dindoost and more
Potential Business Impact:
Makes computer networks build fake ones faster.
Community detection, or network clustering, is used to identify latent community structure in networks. Due to the scarcity of labeled ground truth in real-world networks, evaluating these algorithms poses significant challenges. To address this, researchers use synthetic network generators that produce networks with ground-truth community labels. RECCS is one such algorithm that takes a network and its clustering as input and generates a synthetic network through a modular pipeline. Each generated ground truth cluster preserves key characteristics of the corresponding input cluster, including connectivity, minimum degree, and degree sequence distribution. The output consists of a synthetically generated network, and disjoint ground truth cluster labels for all nodes. In this paper, we present two enhanced versions: RECCS+ and RECCS++. RECCS+ maintains algorithmic fidelity to the original RECCS while introducing parallelization through an orchestrator that coordinates algorithmic components across multiple processes and employs multithreading. RECCS++ builds upon this foundation with additional algorithmic optimizations to achieve further speedup. Our experimental results demonstrate that RECCS+ and RECCS++ achieve speedups of up to 49x and 139x respectively on our benchmark datasets, with RECCS++'s additional performance gains involving a modest accuracy tradeoff. With this newfound performance, RECCS++ can now scale to networks with over 100 million nodes and nearly 2 billion edges.
Similar Papers
Effective and Efficient Conductance-based Community Search at Billion Scale
Social and Information Networks
Finds better groups of connected things in big networks.
On the Optimization of Methods for Establishing Well-Connected Communities
Social and Information Networks
Finds important groups in huge online networks.
Contrastive clustering based on regular equivalence for influential node identification in complex networks
Social and Information Networks
Finds important people in online groups.