Learning from sanctioned government suppliers: A machine learning and network science approach to detecting fraud and corruption in Mexico
By: Martí Medina-Hern ández, Janos Kertész, Mihály Fazekas
Detecting fraud and corruption in public procurement remains a major challenge for governments worldwide. Most research to-date builds on domain-knowledge-based corruption risk indicators of individual contract-level features and some also analyzes contracting network patterns. A critical barrier for supervised machine learning is the absence of confirmed non-corrupt, negative, examples, which makes conventional machine learning inappropriate for this task. Using publicly available data on federally funded procurement in Mexico and company sanction records, this study implements positive-unlabeled (PU) learning algorithms that integrate domain-knowledge-based red flags with network-derived features to identify likely corrupt and fraudulent contracts. The best-performing PU model on average captures 32 percent more known positives and performs on average 2.3 times better than random guessing, substantially outperforming approaches based solely on traditional red flags. The analysis of the Shapley Additive Explanations reveals that network-derived features, particularly those associated with contracts in the network core or suppliers with high eigenvector centrality, are the most important. Traditional red flags further enhance model performance in line with expectations, albeit mainly for contracts awarded through competitive tenders. This methodology can support law enforcement in Mexico, and it can be adapted to other national contexts too.
Similar Papers
Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network
Machine Learning (CS)
Finds company money cheats before they happen.
Semi-Supervised Supply Chain Fraud Detection with Unsupervised Pre-Filtering
Machine Learning (CS)
Finds fake items in shipping faster.
Advanced fraud detection using machine learning models: enhancing financial transaction security
Machine Learning (CS)
Finds fake credit card charges faster.