On utility of gene set signatures in gene expression-based cancer class prediction
Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar, Blaž Zupan;
JMLR W&CP 8:55-64, 2010.
Abstract
Machine learning methods that can use additional knowledge in their
inference process are central to the development of integrative
bioinformatics. Inclusion of background knowledge improves robustness,
predictive accuracy and interpretability. Recently, a set of such
techniques has been proposed that use information on gene sets for
supervised data mining of class-labeled microarray data sets. We here
present a new gene set-based supervised learning approach named
setsig and systematically investigate the predictive
accuracy of this and other gene set approaches compared to the
standard inference model where only gene expression information is
used. Our results indicate that
setsig outperforms other
gene set approaches, but contrary to earlier reports, transformation
of gene expression data to the space of gene set signatures does not
result in increased accuracy of predictive models when compared to
those trained directly from original (not transformed) data.