RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video Analytics
By: Gur-Eyal Sela , Kumar Krishna Agrawal , Bharathan Balaji and more
Potential Business Impact:
Makes video analysis cheaper and faster.
Live video analytics (LVA) runs continuously across massive camera fleets, but inference cost with modern vision models remains high. To address this, dynamic model size selection (DMSS) is an attractive approach: it is content-aware but treats models as black boxes, and could potentially reduce cost by up to 10x without model retraining or modification. Without ground truth labels at runtime, we observe that DMSS methods use two stages per segment: (i) sampling a few models to calculate prediction statistics (e.g., confidences), then (ii) selection of the model size from those statistics. Prior systems fail to generalize to diverse workloads, particularly to mobile videos and lower accuracy targets. We identify that the failure modes stem from inefficient sampling whose cost exceeds its benefit, and inaccurate per-segment accuracy prediction. In this work, we present RedunCut, a new DMSS system that addresses both: It uses a measurement-driven planner that estimates the cost-benefit tradeoff of sampling, and a lightweight, data-driven performance model to improve accuracy prediction. Across road-vehicle, drone, and surveillance videos and multiple model families and tasks, RedunCut reduces compute cost by 14-62% at fixed accuracy and remains robust to limited historical data and to drift.
Similar Papers
StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding
CV and Pattern Recognition
Makes videos understandable for computers faster.
MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples
CV and Pattern Recognition
Finds hidden flaws in products without examples.
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
CV and Pattern Recognition
Helps computers understand long videos faster.