Analyzing Deviations from Monotonic Trends through Database Repair
By: Shunit Agmon , Jonathan Gal , Amir Gilad and more
Datasets often exhibit violations of expected monotonic trends - for example, higher education level correlating with higher average salary, newer homes being more expensive, or diabetes prevalence increasing with age. We address the problem of quantifying how far a dataset deviates from such trends. To this end, we introduce Aggregate Order Dependencies (AODs), an aggregation-centric extension of the previously studied order dependencies. An AOD specifies that the aggregated value of a target attribute (e.g., mean salary) should monotonically increase or decrease with the grouping attribute (e.g., education level). We formulate the AOD repair problem as finding the smallest set of tuples to delete from a table so that the given AOD is satisfied. We analyze the computational complexity of this problem and propose a general algorithmic template for solving it. We instantiate the template for common aggregation functions, introduce optimization techniques that substantially improve the runtime of the template instances, and develop efficient heuristic alternatives. Our experimental study, carried out on both real-world and synthetic datasets, demonstrates the practical efficiency of the algorithms and provides insight into the performance of the heuristics. We also present case studies that uncover and explain unexpected AOD violations using our framework.
Similar Papers
Aggregation Hides Out-of-Distribution Generalization Failures from Spurious Correlations
Machine Learning (CS)
Finds hidden computer mistakes in new situations.
Minimizing Age of Detection for a Markov Source over a Lossy Channel
Information Theory
Helps machines know when to check important things.
Bias as a Virtue: Rethinking Generalization under Distribution Shifts
Machine Learning (CS)
Makes computer learning work better on new data.