Recently, many authors have proposed decomposition algorithms for regression. For instance, Shevade et al [13] proposed two modifications of the SMO algorithm from Platt [12] for regression, based on a previous paper from the same team [7] for classification problems. Laskov [8] proposed also a decomposition method for regression problems which is very similar to the second modification proposed by Shevade et al. In fact, it is easy to see that Laskov's method with a subproblem of size 2 uses the same selection algorithm as well as the same termination criterion as Shevade et al.
Their method for selecting the working set is very similar to the one we show in this paper, but while we propose to select variables independently of their counterparts , they propose to select simultaneously pairs of variables . Even if this seems to be a small difference, let us note that at the optimal solution as well as during the optimization process, as proved by Lin [9] in the case of algorithms such as the one we propose here.4 Thus, one of the two variables or is always equal to 0, and choosing the and independently can thus help to quickly eliminate many variables, thanks to the shrinking phase,5 which, of course, has a direct impact on the speed of our algorithm. In fact, working with pairs of variables would force the algorithm to do many computations with null variables until the end of the optimization process.
Similarly, Smola and Schölkopf [14] also proposed earlier to use a decomposition algorithm for regression based on SMO, using an analytical solution for the subproblems, but again they proposed to select two pairs of variables (two and their corresponding ) instead of two variables as we propose in this paper.
Finally, Flake and Lawrence [3] proposed a modification of SMO for regression that uses the heuristics proposed by Platt [12] and those from Smola and Schölkopf [14], but their modification uses a new variable . Once again, this forces the use of the pairs during the computations.
The originality of our algorithm is thus to select independently the variables and , which has the side effect of efficiently adapting the shrinking step proposed in classification by Joachims [5] (it is indeed less easy to think of an efficient shrinking method in the context of pairs of variables). This also helps to simplify the resolution of the subproblem in the case where q=2. Finally, experiments given in section 3 suggest that this idea leads to faster convergence times.