Ensemble Performance Through the Lens of Linear Independence of Classifier Votes in Data Streams
By: Enes Bektas, Fazli Can
Potential Business Impact:
Makes computer predictions better by picking smarter groups.
Ensemble learning improves classification performance by combining multiple base classifiers. While increasing the number of classifiers generally enhances accuracy, excessively large ensembles can lead to computational inefficiency and diminishing returns. This paper investigates the relationship between ensemble size and performance through the lens of linear independence among classifier votes in data streams. We propose that ensembles composed of linearly independent classifiers maximize representational capacity, particularly under a geometric model. We then generalize the importance of linear independence to the weighted majority voting problem. By modeling the probability of achieving linear independence among classifier outputs, we derive a theoretical framework that explains the trade-off between ensemble size and accuracy. Our analysis leads to a theoretical estimate of the ensemble size required to achieve a user-specified probability of linear independence. We validate our theory through experiments on both real-world and synthetic datasets using two ensemble methods, OzaBagging and GOOWE. Our results confirm that this theoretical estimate effectively identifies the point of performance saturation for robust ensembles like OzaBagging. Conversely, for complex weighting schemes like GOOWE, our framework reveals that high theoretical diversity can trigger algorithmic instability. Our implementation is publicly available to support reproducibility and future research.
Similar Papers
A Cooperative Game-Based Multi-Criteria Weighted Ensemble Approach for Multi-Class Classification
Machine Learning (CS)
Makes AI smarter by combining different "brains."
Option Pricing Using Ensemble Learning
Machine Learning (CS)
Makes computer stock predictions more accurate.
The cost of ensembling: is it always worth combining?
Machine Learning (CS)
Makes computer predictions faster and cheaper.