Minimal Kernel Classifiers

Glenn M. Fung, Olvi L. Mangasarian, Alexander J. Smola; 3(Nov):303-321, 2002.

Abstract

A finite concave minimization algorithm is proposed for constructing kernel classifiers that use a minimal number of data points both in generating and characterizing a classifier. The algorithm is theoretically justified on the basis of linear programming perturbation theory and a leave-one-out error bound as well as effective computational results on seven real world datasets. A nonlinear rectangular kernel is generated by systematically utilizing as few of the data as possible both in training and in characterizing a nonlinear separating surface. This can result in substantial reduction in kernel data-dependence (over 94% in six of the seven public datasets tested on) and with test set correctness equal to that obtained by using a conventional support vector machine classifier that depends on many more data points. This reduction in data dependence results in a much faster classifier that requires less storage. To eliminate data points, the proposed approach makes use of a novel loss function, the "pound" function ()#, which is a linear combination of the 1-norm and the step function that measures both the magnitude and the presence of any error.

[abs] [pdf] [ps.gz] [ps]