Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities
By: Verónica Álvarez , Santiago Mazuelas , Steven An and more
Potential Business Impact:
Makes computer guesses for data more trustworthy.
The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that provide rough guesses for labels. Weak LFs commonly provide guesses with assorted types and unknown interdependences that can result in unreliable predictions. Furthermore, existing techniques for programmatic weak supervision cannot provide assessments for the reliability of the probabilistic predictions for labels. This paper presents a methodology for programmatic weak supervision that can provide confidence intervals for label probabilities and obtain more reliable predictions. In particular, the methods proposed use uncertainty sets of distributions that encapsulate the information provided by LFs with unrestricted behavior and typology. Experiments on multiple benchmark datasets show the improvement of the presented methods over the state-of-the-art and the practicality of the confidence intervals presented.
Similar Papers
A Unified and Stable Risk Minimization Framework for Weakly Supervised Learning with Theoretical Guarantees
Machine Learning (CS)
Teaches computers with less information.
Neuro-symbolic Weak Supervision: Theory and Semantics
Artificial Intelligence
Makes smart programs learn better from messy information.
Learning from Similarity-Confidence and Confidence-Difference
Machine Learning (CS)
Teaches computers with less correct examples.