Short sequence motifs, overrepresented in mammalian conservednon-coding sequences
Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in mutation, suggesting thatselection which causes their conservation is not always verystrong.
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Director, Office of Science; National Institutes ofHealth
- DOE Contract Number:
- DE-AC02-05CH11231; NIHU1HL66681B
- OSTI ID:
- 923415
- Report Number(s):
- LBNL-62510; R&D Project: GHPG6C; BnR: 400412000; TRN: US200804%%1117
- Journal Information:
- BMC Genomics, Vol. 8, Issue 378; Related Information: Journal Publication Date: 10/18/2007
- Country of Publication:
- United States
- Language:
- English
Similar Records
Comparative anatomy of the human APRT gene and enzyme: nucleotide sequence divergence and conservation of a nonrandom CpG dinucleotide arrangement
Genomic organization of the human fibroblast growth factor receptor 3 (FGFR3) gene and comparative sequence analysis with the mouse Fgfr3 gene