Convex and Scalable Weakly Labeled SVMs

Yu-Feng Li; Ivor W. Tsang; James T. Kwok; Zhi-Hua Zhou

In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete. This includes, for example, (i) semi- supervised learning where labels are partially known; (ii) multi-instance learning where labels are implicitly known; and (iii) clustering where labels are completely unknown. Unlike supervised learning, learning with weak labels involves a difficult Mixed-Integer Programming (MIP) problem. Therefore, it can suffer from poor scalability and may also get stuck in local minimum. In this paper, we focus on SVMs and propose the WELLSVM via a novel label generation strategy. This leads to a convex relaxation of the original MIP, which is at least as tight as existing convex Semi-Definite Programming (SDP) relaxations. Moreover, the WELLSVM can be solved via a sequence of SVM subproblems that are much more scalable than previous convex SDP relaxations. Experiments on three weakly labeled learning tasks, namely, (i) semi-supervised learning; (ii) multi-instance learning for locating regions of interest in content-based information retrieval; and (iii) clustering, clearly demonstrate improved performance, and WELLSVM is also readily applicable on large data sets.

Convex and Scalable Weakly Labeled SVMs

Abstract