As was the case in our recent active set support vector machine (ASVM) approach [18], the following two simple changes were made to the standard linear SVM: (i) The margin (distance) between the parallel bounding planes was maximized with respect to both orientation () as well as location relative to the origin (). Such a change was also carried out in our successive overrelaxation (SOR) approach [17] as well as in the smooth support vector machine (SSVM) approach of [14]. See equation (7) and Figure 1. (ii) The error in the soft margin () was minimized using the 2-norm squared instead of the conventional 1-norm. See equation (7). These straightforward, but important changes, lead to a considerably simpler positive definite dual problem with nonnegativity constraints only. See equation (8). Although our use of a quadratic penalty for the soft margin as indicated in (ii) above is nonstandard, it has been used effectively in previous work [14,13,18]. In theory, this change could lead to a less robust classifier in the presence of outliers. However, this modification to the SVM does not seem to have a detrimental effect, as can be seen in the experimental results of Section 5.
In Section 2 of the paper we begin with the standard SVM formulation and its dual and then give our formulation and its dual. We note that computational evidence given in our previous work [18] shows that this alternative formulation does not compromise on generalization ability. Section 3 gives our simple iterative Lagrangian support vector machine (LSVM) Algorithm 1 and establishes its global linear convergence. LSVM, stated in 11 lines of MATLAB Code 2 below, solves once at the outset a single system of equations in variables given by a symmetric positive definite matrix. It then uses a linearly convergent iterative method to solve the problem. In Section 4 LSVM is extended to positive semidefinite nonlinear kernels. Section 5 describes our numerical results which show that LSVM gives comparable or better testing results than those of other SVM algorithms, and in some cases is dramatically faster than a standard quadratic programming SVM solver.
We now describe our notation and give some background material.
All vectors will be column vectors unless transposed to a row
vector by a prime .
For a vector in the -dimensional real space
, denotes the vector in with
all of its negative components set to zero. This corresponds
to projecting onto the nonnegative orthant.
The base of the natural logarithms will be denoted by
, and for a vector
will
denote a vector in with components
.
The notation
will
signify a real matrix. For such a matrix will
denote the transpose of , will denote the -th row of
and will denote the -th column of .
A vector of ones or zeroes in a real space of arbitrary dimension will be
denoted by or , respectively.
The identity matrix of arbitrary dimension will be denoted by .
For two vectors and in , denotes orthogonality,
that is .
We use
to denote definition. The 2-norm of a vector and a matrix will
be denoted by and respectively.
A separating plane,
with respect to two given point sets
and in , is a plane that attempts
to separate into two halfspaces such that each open halfspace
contains points mostly of or .
A special case of the Sherman-Morrison-Woodbury (SMW) identity
[7] will be utilized: