Summary: Analysis of SAGE Results with Combined
HsuanTien Lin and Ling Li
Learning Systems Group, California Institute of Technology, USA
Abstract. Serial analysis of gene expression (SAGE) experiments could
provide us the expression level of thousands of genes in a biological
sample. However, the number of available samples is relatively small.
Such undersampled problem needs to be carefully analyzed from a ma
chine learning perspective. In this paper, we combine several stateofthe
art techniques for classification, feature selection, and error estimation,
and evaluate the performance of the combined techniques on the SAGE
dataset. Our results show that a novel algorithm, support vector ma
chine with the stump kernel, performs well on the SAGE dataset both
for building an accurate classifier, and for selecting relevant features.
Serial analysis of gene expression (SAGE) experiments could provide us an enor
mous amount of information on the expression level of di#erent genes in some
cell populations. The expression level is evaluated by counting the occurrences
of the SAGE tags that can identify an unique transcript .