DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GATA: a graphic alignment tool for comparative sequence analysis

Journal Article · · BMC Bioinformatics
 [1];  [1]
  1. Univ. of California, Berkeley, CA (United States). Dept. of Molecular and Cell Biology; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Life Sciences Division, Dept. of Genome Science

Background: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For noncoding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate noncoding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments. Results: To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA http://gata.sourceforge.net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file. Conclusions: GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0–200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1626311
Journal Information:
BMC Bioinformatics, Vol. 6, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

Cited By (15)

Comparative Genomics of Cereal Crops: Status and Future Prospects book January 2014
Characterization of the glutathione S-transferase (GST) gene family in Pyrus bretschneideri and their expression pattern upon superficial scald development journal June 2018
Structural genomics and transcriptional characterization of the Dormancy-Associated MADS-box genes during bud dormancy progression in apple journal April 2016
Identification of eukaryotic translation initiation factors and the temperature-dependent nature of Turnip mosaic virus epidemics in allopolyploid Brassica juncea journal January 2020
Evolutionary conservation of cold-induced antisense RNAs of FLOWERING LOCUS C in Arabidopsis thaliana perennial relatives journal July 2014
Genome sequencing and comparative genomics reveal the potential pathogenic mechanism of Cercospora sojina Hara on soybean journal September 2017
Transposons played a major role in the diversification between the closely related almond (Prunus dulcis) and peach (P. persica) genomes: Results from the almond genome sequence posted_content June 2019
Evolution of the large genome in Capsicum annuum occurred through accumulation of single-type long terminal repeat retrotransposons and their derivatives: Mechanism of genome expansion in pepper journal December 2011
Divergence of annual and perennial species in the Brassicaceae and the contribution of cis-acting variation at FLC orthologues journal March 2017
Genome sequence of Valsa canker pathogens uncovers a potential adaptation of colonization of woody bark journal July 2015
GenomeMatcher: A graphical user interface for DNA sequence comparison journal September 2008
Genomic characteristics and comparative genomics analysis of the endophytic fungus Sarocladium brachiariae journal October 2019
Polyamines in the life of Arabidopsis: profiling the expression of S-adenosylmethionine decarboxylase (SAMDC) gene family during its life cycle journal December 2017
Genome-wide analysis of polygalacturonase gene family from pear genome and identification of the member involved in pear softening journal December 2019
Genome-Wide Analysis of the Glutathione S-Transferase Gene Family in Capsella rubella: Identification, Expression, and Biochemical Functions journal August 2016