## Semi-Supervised Eigenvectors for Large-Scale Locally-Biased Learning

** Toke J. Hansen, Michael W. Mahoney**; 15(Nov):3871−3914, 2014.

### Abstract

In many applications, one has side information, *e.g.*,
labels that are provided in a semi-supervised manner, about a
specific target region of a large data set, and one wants to
perform machine learning and data analysis tasks "nearby" that
prespecified target region. For example, one might be interested
in the clustering structure of a data graph near a prespecified
"seed set" of nodes, or one might be interested in finding
partitions in an image that are near a prespecified "ground
truth" set of pixels. Locally-biased problems of this sort are
particularly challenging for popular eigenvector-based machine
learning and data analysis tools. At root, the reason is that
eigenvectors are inherently global quantities, thus limiting the
applicability of eigenvector-based methods in situations where
one is interested in very local properties of the data.

In this paper, we address this issue by providing a
methodology to construct *semi-supervised eigenvectors* of
a graph Laplacian, and we illustrate how these locally-biased
eigenvectors can be used to perform *locally-biased machine
learning*. These semi-supervised eigenvectors capture
successively-orthogonalized directions of maximum variance,
conditioned on being well-correlated with an input seed set of
nodes that is assumed to be provided in a semi-supervised
manner. We show that these semi-supervised eigenvectors can be
computed quickly as the solution to a system of linear
equations; and we also describe several variants of our basic
method that have improved scaling properties. We provide several
empirical examples demonstrating how these semi-supervised
eigenvectors can be used to perform locally-biased learning; and
we discuss the relationship between our results and recent
machine learning algorithms that use global eigenvectors of the
graph Laplacian.