skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Short sequence motifs, overrepresented in mammalian conserved non-coding sequences

; ; ; ;
Publication Date:
Research Org.:
Joint Genome Institute (JGI)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
Resource Type:
Journal Article
Resource Relation:
Journal Name: BMC Genomics; Journal Volume: 8; Journal Issue: 1
Country of Publication:
United States

Citation Formats

Simon,Minovitsky, Philip,Stegmaier, Alexander,Kel, Alexey S,Kondrashov, and Inna,Dubchak. Short sequence motifs, overrepresented in mammalian conserved non-coding sequences. United States: N. p., 2007. Web. doi:10.1186/1471-2164-8-378.
Simon,Minovitsky, Philip,Stegmaier, Alexander,Kel, Alexey S,Kondrashov, & Inna,Dubchak. Short sequence motifs, overrepresented in mammalian conserved non-coding sequences. United States. doi:10.1186/1471-2164-8-378.
Simon,Minovitsky, Philip,Stegmaier, Alexander,Kel, Alexey S,Kondrashov, and Inna,Dubchak. Mon . "Short sequence motifs, overrepresented in mammalian conserved non-coding sequences". United States. doi:10.1186/1471-2164-8-378.
title = {Short sequence motifs, overrepresented in mammalian conserved non-coding sequences},
author = {Simon,Minovitsky and Philip,Stegmaier and Alexander,Kel and Alexey S,Kondrashov and Inna,Dubchak},
abstractNote = {},
doi = {10.1186/1471-2164-8-378},
journal = {BMC Genomics},
number = 1,
volume = 8,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
  • Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When comparedmore » tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in mutation, suggesting thatselection which causes their conservation is not always verystrong.« less
  • Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs eachmore » inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.« less
  • The authors have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. This element was highly conserved at the samemore » position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome.« less
  • ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much codingmore » and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...« less
  • All class II major histocompatibility complex genes contain two highly conserved sequences, termed X and Y, with the promoter region(s), which may have a role in regulation of expression. To study trans-acting factors that interact with these sequences, sequence-specific DNA binding activity has been examined by the gel electrophoresis retardation assay using the HLA-DQ2..beta.. gene 5' flanking DNA and nuclear extracts derived from various cell types. Several specific protein-binding activities were found using a 45-base-pair (bp) HinfI/Sau96I (-142 to -98 bp) and a 38-bp Sau96I/Sau96I (-97 to -60 bp) fragment, which include conserved sequence X (-113 to -100 bp) andmore » conserved sequence Y (-80 to -71 bp), respectively. Competition experiments, methylation interference analysis, and DNase I footprinting demonstrated that distinct proteins in a nuclear extract of Raji cells (a human B lymphoma line) bind to sequence X, to sequence Y, and to DNA 5' of the X sequence (termed sequence W). The factor binding site in the W sequence is also found to be conserved among ..beta..-chain genes and is suggested to be a ..gamma..-interferon control region.« less