On pattern classification with weighted dimensions
By: Ayatullah Faruk Mollah
Potential Business Impact:
Helps computers sort information better by weighing importance.
Studies on various facets of pattern classification is often imperative while working with multi-dimensional samples pertaining to diverse application scenarios. In this notion, weighted dimension-based distance measure has been one of the vital considerations in pattern analysis as it reflects the degree of similarity between samples. Though it is often presumed to be settled with the pervasive use of Euclidean distance, plethora of issues often surface. In this paper, we present (a) a detail analysis on the impact of distance measure norms and weights of dimensions along with visualization, (b) a novel weighting scheme for each dimension, (c) incorporation of this dimensional weighting schema into a KNN classifier, and (d) pattern classification on a variety of synthetic as well as realistic datasets with the developed model. It has performed well across diverse experiments in comparison to the traditional KNN under the same experimental setups. Specifically, for gene expression datasets, it yields significant and consistent gain in classification accuracy (around 10%) in all cross-validation experiments with different values of k. As such datasets contain limited number of samples of high dimensions, meaningful selection of nearest neighbours is desirable, and this requirement is reasonably met by regulating the shape and size of the region enclosing the k number of reference samples with the developed weighting schema and appropriate norm. It, therefore, stands as an important generalization of KNN classifier powered by weighted Minkowski distance with the present weighting schema.
Similar Papers
Optimizing Data-driven Weights In Multidimensional Indexes
Econometrics
Makes important decisions fairer by choosing better weights.
Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters
Machine Learning (CS)
Finds data that matches specific tags faster.
A general framework for adaptive nonparametric dimensionality reduction
Machine Learning (Stat)
Finds best way to show complex data simply.