Next: 5. Numerical Implementation and
Up: Lagrangian Support Vector Machines
Previous: 3. LSVM (Lagrangian Support
4. LSVM for Nonlinear Kernels
In this section of the paper we show how LSVM can be used to solve
classification problems with positive semidefinite nonlinear kernels.
Algorithm 1 and its convergence can be extended for such
nonlinear kernels as we show below. The only price paid for this
extension is that problems with large datasets can be handled using
the Sherman-Morrison-Woodbury (SMW) identity (1) only if the
inner product terms of the kernel [16, Equation (3)] are
explicitly known, which in general they are not. Nevertheless LSVM may
be a useful tool for classification with nonlinear kernels because of
its extreme simplicity as we demonstrate below with the simple MATLAB
code for which it does not make use of the Sherman-Morrison-Woodbury
identity nor any optimization package.
We shall use the notation of [16]. For
and
, the kernel
maps
into
.
A typical kernel is the Gaussian kernel
,
, ,
where is the base of natural logarithms, while
a linear kernel is . For a column vector in ,
is a row vector in , and the linear separating
surface (4) is replaced by the nonlinear surface
|
(26) |
where is the solution of the dual problem (11)
with re-defined for a general nonlinear kernel as follows:
|
(27) |
Note that the nonlinear separating surface (26)
degenerates to the linear one (4) if we let
and make use of (9).
To justify the nonlinear kernel formulation (27)
we refer the reader to [16, Equation (8.9)]
for a similar result and give a brief derivation here.
If we rewrite our dual problem for a linear kernel (8)
in the equivalent form:
|
(28) |
and replace the linear kernel by a general nonlinear
positive semidefinite symmetric kernel we obtain:
|
(29) |
This is the formulation given above in (27).
We note that the Karush-Kuhn-Tucker necessary and sufficient
optimality conditions for this problem are:
|
(30) |
which underly LSVM for a nonlinear positive semidefinite kernel
. The positive semidefiniteness of the nonlinear kernel
is needed in order to ensure the existence
of a solution to both (29) and (30).
All the results of the previous section remain valid, with
re-defined as above for any positive semidefinite
kernel . This includes the iterative scheme (15)
and the convergence result given under the Algorithm 1.
However, because we do not make use of the Sherman-Morrison-Woodbury
identity for a nonlinear kernel, the LSVM MATLAB Code 3
is somewhat different and is as follows:
Code 3
LSVM MATLAB M-File for Nonlinear Kernel
function [it, opt,u] = svmlk(nu,itmax,tol,D,KM)
% lsvm with nonlinear kernel for min 1/2*u'*Q*u-e'*u s.t. u=>0
% Q=I/nu+DK(G,G')D, G=[A -e]
% Input: nu, itmax, tol, D, KM=K(G,G')
% [it, opt,u] = svmlk(nu,itmax,tol,D,KM);
m=size(KM,1);alpha=1.9/nu;e=ones(m,1);I=speye(m);it=0;
Q=I/nu+D*KM*D;P=inv(Q);
u=P*e;oldu=u+1;
while it<itmax & norm(oldu-u)>tol
oldu=u;
u=P*(1+pl(Q*u-1-alpha*u));
it=it+1;
end;
opt=norm(u-oldu);[it opt]
function pl = pl(x); pl = (abs(x)+x)/2;
Since we cannot use the Sherman-Morrison-Woodbury identity in the
general nonlinear case, we note that this code
for a nonlinear kernel is effective for
moderately sized problems. The sizes of the matrices and in
Code 3 scale quadratically with the number of data points.
Next: 5. Numerical Implementation and
Up: Lagrangian Support Vector Machines
Previous: 3. LSVM (Lagrangian Support
Journal of Machine Learning Research