Score: 0

Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation

Published: December 16, 2025 | arXiv ID: 2512.14767v1

By: Unai Laskurain, Aitor Aguirre-Ortuzar, Urko Zurutuza

Potential Business Impact:

Helps train AI without sharing private user data.

Business Areas:

Fraud Detection Financial Services, Payments, Privacy and Security

Federated Learning (FL) is an emerging machine learning paradigm that enables multiple parties to collaboratively train models without sharing raw data, ensuring data privacy. In Vertical FL (VFL), where each party holds different features for the same users, a key challenge is to evaluate the feature contribution of each party before any model is trained, particularly in the early stages when no model exists. To address this, the Shapley-CMI method was recently proposed as a model-free, information-theoretic approach to feature valuation using Conditional Mutual Information (CMI). However, its original formulation did not provide a practical implementation capable of computing the required permutations and intersections securely. This paper presents a novel privacy-preserving implementation of Shapley-CMI for VFL. Our system introduces a private set intersection (PSI) server that performs all necessary feature permutations and computes encrypted intersection sizes across discretized and encrypted ID groups, without the need for raw data exchange. Each party then uses these intersection results to compute Shapley-CMI values, computing the marginal utility of their features. Initial experiments confirm the correctness and privacy of the proposed system, demonstrating its viability for secure and efficient feature contribution estimation in VFL. This approach ensures data confidentiality, scales across multiple parties, and enables fair data valuation without requiring the sharing of raw data or training models.

PRIVEE: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks

Machine Learning (CS)

Keeps private data safe during shared learning.

14 Dec 2025 1

90%

Data Privatization in Vertical Federated Learning with Client-wise Missing Problem

Methodology

Keeps private data safe when learning from many sources.

25 Nov 2025 0

89%

The Sherpa.ai Blind Vertical Federated Learning Paradigm to Minimize the Number of Communications

Machine Learning (CS)

Lets computers learn from private data safely.

19 Oct 2025 0

View PDF Login to Bookmark

Page Count

7 pages

Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation

Helps train AI without sharing private user data.

Technical Abstract

PRIVEE: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks

Data Privatization in Vertical Federated Learning with Client-wise Missing Problem

The Sherpa.ai Blind Vertical Federated Learning Paradigm to Minimize the Number of Communications