As was the case in our recent active set support vector machine (ASVM)
approach [18], the following two simple changes were made to
the standard linear SVM: (i) The margin (distance) between the
parallel bounding planes was maximized with respect to both
orientation () as well as location relative to the origin
(
). Such a change was also carried out in our successive
overrelaxation (SOR) approach [17] as well as in the smooth
support vector machine (SSVM) approach of [14]. See equation
(7) and Figure 1. (ii) The error in the soft
margin (
) was minimized using the 2-norm squared instead of the
conventional 1-norm. See equation (7). These
straightforward, but important changes, lead to a considerably simpler
positive definite dual problem with nonnegativity constraints
only. See equation (8). Although our use of a
quadratic penalty for the soft margin as indicated in (ii) above is
nonstandard, it has been used effectively in previous work
[14,13,18]. In theory, this change could lead to a
less robust classifier in the presence of outliers. However, this
modification to the SVM does not seem to have a detrimental effect, as
can be seen in the experimental results of Section 5.
In Section 2 of the paper we begin with the standard SVM
formulation and its dual and then give our formulation and its
dual. We note that computational evidence given in our previous work
[18] shows that this alternative formulation does not
compromise on generalization ability. Section 3 gives
our simple iterative Lagrangian support vector machine (LSVM)
Algorithm 1 and establishes its global linear convergence.
LSVM, stated in 11 lines of MATLAB Code 2 below, solves
once at the outset a single system of equations in
variables given by a symmetric positive definite matrix. It then uses
a linearly convergent iterative method to solve the problem. In
Section 4 LSVM is extended to positive semidefinite
nonlinear kernels. Section 5 describes our numerical
results which show that LSVM gives comparable or better testing
results than those of other SVM algorithms, and in some cases is
dramatically faster than a standard quadratic programming SVM solver.
We now describe our notation and give some background material.
All vectors will be column vectors unless transposed to a row
vector by a prime .
For a vector
in the
-dimensional real space
,
denotes the vector in
with
all of its negative components set to zero. This corresponds
to projecting
onto the nonnegative orthant.
The base of the natural logarithms will be denoted by
, and for a vector
will
denote a vector in
with components
.
The notation
will
signify a real
matrix. For such a matrix
will
denote the transpose of
,
will denote the
-th row of
and
will denote the
-th column of
.
A vector of ones or zeroes in a real space of arbitrary dimension will be
denoted by
or
, respectively.
The identity matrix of arbitrary dimension will be denoted by
.
For two vectors
and
in
,
denotes orthogonality,
that is
.
We use
to denote definition. The 2-norm of a vector
and a matrix
will
be denoted by
and
respectively.
A separating plane,
with respect to two given point sets
and
in
, is a plane that attempts
to separate
into two halfspaces such that each open halfspace
contains points mostly of
or
.
A special case of the Sherman-Morrison-Woodbury (SMW) identity
[7] will be utilized: