Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information
By: Antoine Ledent, Mun Chong Soo, Nong Minh Hieu
Potential Business Impact:
Helps movie apps guess what you'll like.
We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common subspace}. We assume that a large amount $M$ of \textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\widetilde{O}\left(\sqrt{\frac{nd}{M}}\right)$ and $\widetilde{O}\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.
Similar Papers
Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information
Machine Learning (CS)
Helps movie sites guess what you'll like.
Statistical Inference for Matching Decisions via Matrix Completion under Dependent Missingness
Methodology
Helps match people to jobs better.
On the sample complexity of semi-supervised multi-objective learning
Machine Learning (Stat)
Teaches computers many things with less labeled examples.