Asymptotic behavior of Support Vector Machine for spiked population model
Hanwen Huang; 18(45):1−21, 2017.
For spiked population model, we investigate the large dimension $N$ and large sample size $M$ asymptotic behavior of the Support Vector Machine (SVM) classification method in the limit of $N,M\rightarrow\infty$ at fixed $\alpha=M/N$. We focus on the generalization performance by analytically evaluating the angle between the normal direction vectors of SVM separating hyperplane and corresponding Bayes optimal separating hyperplane. This is an analogous result to the one shown in Paul (2007) and Nadler (2008) for the angle between the sample eigenvector and the population eigenvector in random matrix theorem. We provide not just bound, but sharp prediction of the asymptotic behavior of SVM that can be determined by a set of nonlinear equations. Based on the analytical results, we propose a new method of selecting tuning parameter which significantly reduces the computational cost. A surprising finding is that SVM achieves its best performance at small value of the tuning parameter under spiked population model. These results are confirmed to be correct by comparing with those of numerical simulations on finite-size systems. We also apply our formulas to an actual dataset of breast cancer and find agreement between analytical derivations and numerical computations based on cross validation.
|© JMLR 2017. (edit, beta)|