We propose to select a new set of q variables such that the overall criterion will be optimized. In order to select such a working set, we use the same idea as Joachims [5]: simply search for the optimal gradient-descent direction p which is feasible and which has only q non-null components. The variables corresponding to these components are chosen to be the new working set .
We thus need to minimize:
Since we are searching for an optimal descent direction, which is a direction where the scalar product with the gradient is the smallest, we want indeed to minimize (6). The conditions (7) and (8) are necessary to ensure the feasibility of the obtained direction. The condition (9) is there only to ensure that the problem has a solution. Finally, (10) is imposed because we are searching for a direction with only q non-null components.
Note that the derivative of
can be easily computed:
In order to solve this problem, it thus suffices to consider
Since we are searching for exactly q variables, the must be distinct. We could have to reduce q if one variable is selected twice.
Proof: Let us go back to the minimization
problem of the function
In the case where 2l = q = 2r, it is easy to see that the minimum is obtained for zi = -1 if and zi = 1 if : if for instance zi0 is augmented by for a , then one needs to compensate by another zj to maintain (12). Since we want to minimize (11), the best thing to do, knowing that the are sorted in reverse order and keeping in mind the constraint (13), is to modify zq and thus to fix . Equation (11) is then augmented by , which is a positive value because the are sorted and thus we get out of the minimum.
In the case where
2l > q = 2r, suppose we found a
z
which is a solution
of (11). Let us denote
the q indices of
the components of
z which are non-null. Using the same
argument as in the previous paragraph, it is clear that
and that
.
In other words, if
z is a solution of our problem, then
we necessarily have
.
Considering again the order
of the ,
it becomes evident that we have to take
and
.
Our new working set is then composed of the q variables corresponding to the indices ci (where an index corresponds to and an index ci > l corresponds to ).