Score: 0

PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning

Published: November 15, 2025 | arXiv ID: 2511.12278v1

By: Mingqi Wu, Qiang Sun, Yi Yang

Potential Business Impact:

Find hidden patterns in messy data.

Business Areas:
Image Recognition Data and Analytics, Software

High-dimensional data often contain low-dimensional signals obscured by structured background noise, which limits the effectiveness of standard PCA. Motivated by contrastive learning, we address the problem of recovering shared signal subspaces from positive pairs, paired observations sharing the same signal but differing in background. Our baseline, PCA+, uses alignment-only contrastive learning and succeeds when background variation is mild, but fails under strong noise or high-dimensional regimes. To address this, we introduce PCA++, a hard uniformity-constrained contrastive PCA that enforces identity covariance on projected features. PCA++ has a closed-form solution via a generalized eigenproblem, remains stable in high dimensions, and provably regularizes against background interference. We provide exact high-dimensional asymptotics in both fixed-aspect-ratio and growing-spike regimes, showing uniformity's role in robust signal recovery. Empirically, PCA++ outperforms standard PCA and alignment-only PCA+ on simulations, corrupted-MNIST, and single-cell transcriptomics, reliably recovering condition-invariant structure. More broadly, we clarify uniformity's role in contrastive learning, showing that explicit feature dispersion defends against structured noise and enhances robustness.

Country of Origin
🇨🇦 Canada

Page Count
41 pages

Category
Statistics:
Machine Learning (Stat)