skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Classification of non-coding RNA using graph representations ofsecondary structure

Conference ·
OSTI ID:882083

Some genes produce transcripts that function directly in regulatory, catalytic, or structural roles in the cell. These non-coding RNAs are prevalent in all living organisms, and methods that aid the understanding of their functional roles are essential. RNA secondary structure, the pattern of base-pairing, contains the critical information for determining the three dimensional structure and function of the molecule. In this work we examine whether the basic geometric and topological properties of secondary structure are sufficient to distinguish between RNA families in a learning framework. First, we develop a labeled dual graph representation of RNA secondary structure by adding biologically meaningful labels to the dual graphs proposed by Gan et al [1]. Next, we define a similarity measure directly on the labeled dual graphs using the recently developed marginalized kernels [2]. Using this similarity measure, we were able to train Support Vector Machine classifiers to distinguish RNAs of known families from random RNAs with similar statistics. For 22 of the 25 families tested, the classifier achieved better than 70% accuracy, with much higher accuracy rates for some families. Training a set of classifiers to automatically assign family labels to RNAs using a one vs. all multi-class scheme also yielded encouraging results. From these initial learning experiments, we suggest that the labeled dual graph representation, together with kernel machine methods, has potential for use in automated analysis and classification of uncharacterized RNA molecules or efficient genome-wide screens for RNA molecules from existing families.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Director, Office of Science; National Institutes ofHealth
DOE Contract Number:
DE-AC02-05CH11231; NIH5 RO1HG002665-02
OSTI ID:
882083
Report Number(s):
LBNL-58161; R&D Project: SHSH02; TRN: US200613%%578
Resource Relation:
Conference: Pacific Symposium on Biocomputing 2005, Wailea,Maui, Jan. 3-7, 2005
Country of Publication:
United States
Language:
English