skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

Abstract

Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshortmore » motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in mutation, suggesting thatselection which causes their conservation is not always verystrong.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
Sponsoring Org.:
USDOE Director, Office of Science; National Institutes ofHealth
OSTI Identifier:
923415
Report Number(s):
LBNL-62510
R&D Project: GHPG6C; BnR: 400412000; TRN: US200804%%1117
DOE Contract Number:
DE-AC02-05CH11231; NIHU1HL66681B
Resource Type:
Journal Article
Resource Relation:
Journal Name: BMC Genomics; Journal Volume: 8; Journal Issue: 378; Related Information: Journal Publication Date: 10/18/2007
Country of Publication:
United States
Language:
English
Subject:
60; ABUNDANCE; AVOIDANCE; DNA; FUNCTIONALS; GENES; NUCLEOTIDES; TRANSCRIPTION; TRANSCRIPTION FACTORS

Citation Formats

Minovitsky, Simon, Stegmaier, Philip, Kel, Alexander, Kondrashov,Alexey S., and Dubchak, Inna. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences. United States: N. p., 2007. Web.
Minovitsky, Simon, Stegmaier, Philip, Kel, Alexander, Kondrashov,Alexey S., & Dubchak, Inna. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences. United States.
Minovitsky, Simon, Stegmaier, Philip, Kel, Alexander, Kondrashov,Alexey S., and Dubchak, Inna. Wed . "Short sequence motifs, overrepresented in mammalian conservednon-coding sequences". United States. doi:. https://www.osti.gov/servlets/purl/923415.
@article{osti_923415,
title = {Short sequence motifs, overrepresented in mammalian conservednon-coding sequences},
author = {Minovitsky, Simon and Stegmaier, Philip and Kel, Alexander and Kondrashov,Alexey S. and Dubchak, Inna},
abstractNote = {Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in mutation, suggesting thatselection which causes their conservation is not always verystrong.},
doi = {},
journal = {BMC Genomics},
number = 378,
volume = 8,
place = {United States},
year = {Wed Feb 21 00:00:00 EST 2007},
month = {Wed Feb 21 00:00:00 EST 2007}
}