DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment

Abstract

Abstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. Results We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . Conclusions EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.

Authors:
; ; ;
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
2229173
Resource Type:
Published Article
Journal Name:
Algorithms for Molecular Biology
Additional Journal Information:
Journal Name: Algorithms for Molecular Biology Journal Volume: 18 Journal Issue: 1; Journal ID: ISSN 1748-7188
Publisher:
Springer Science + Business Media
Country of Publication:
United Kingdom
Language:
English

Citation Formats

Shen, Chengze, Liu, Baqiao, Williams, Kelly P., and Warnow, Tandy. EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment. United Kingdom: N. p., 2023. Web. doi:10.1186/s13015-023-00247-x.
Shen, Chengze, Liu, Baqiao, Williams, Kelly P., & Warnow, Tandy. EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment. United Kingdom. https://doi.org/10.1186/s13015-023-00247-x
Shen, Chengze, Liu, Baqiao, Williams, Kelly P., and Warnow, Tandy. Thu . "EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment". United Kingdom. https://doi.org/10.1186/s13015-023-00247-x.
@article{osti_2229173,
title = {EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment},
author = {Shen, Chengze and Liu, Baqiao and Williams, Kelly P. and Warnow, Tandy},
abstractNote = {Abstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. Results We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . Conclusions EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.},
doi = {10.1186/s13015-023-00247-x},
journal = {Algorithms for Molecular Biology},
number = 1,
volume = 18,
place = {United Kingdom},
year = {Thu Dec 07 00:00:00 EST 2023},
month = {Thu Dec 07 00:00:00 EST 2023}
}

Works referenced in this record:

WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment
journal, August 2022

  • Shen, Chengze; Park, Minhyuk; Warnow, Tandy
  • Journal of Computational Biology, Vol. 29, Issue 8
  • DOI: 10.1089/cmb.2021.0585

Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
journal, July 2016


Bridging the gap in RNA structure prediction
journal, April 2007

  • Shapiro, Bruce A.; Yingling, Yaroslava G.; Kasprzak, Wojciech
  • Current Opinion in Structural Biology, Vol. 17, Issue 2
  • DOI: 10.1016/j.sbi.2007.03.001

MAGUS: Multiple sequence Alignment using Graph clUStering
journal, November 2020


PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences
journal, May 2015

  • Mirarab, Siavash; Nguyen, Nam; Guo, Sheng
  • Journal of Computational Biology, Vol. 22, Issue 5
  • DOI: 10.1089/cmb.2014.0156

Mutual Information in Protein Multiple Sequence Alignments Reveals Two Classes of Coevolving Positions
journal, May 2005

  • Gloor, Gregory B.; Martin, Louise C.; Wahl, Lindi M.
  • Biochemistry, Vol. 44, Issue 19
  • DOI: 10.1021/bi050293e

UPP2: fast and accurate alignment of datasets with fragmentary sequences
journal, January 2023


HMMerge: an ensemble method for multiple sequence alignment
journal, January 2023


Adding unaligned sequences into an existing alignment using MAFFT and LAST
journal, September 2012


MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences
journal, November 2021


Rose: generating sequence families
journal, March 1998


PASTA for proteins
journal, June 2018


Ultra-large alignments using phylogeny-aware profiles
journal, June 2015


Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega
journal, January 2011

  • Sievers, Fabian; Wilm, Andreas; Dineen, David
  • Molecular Systems Biology, Vol. 7, Issue 1
  • DOI: 10.1038/msb.2011.75

Highly accurate protein structure prediction with AlphaFold
journal, July 2021


WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
journal, January 2023


FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010


FASTSP: linear time calculation of alignment accuracy
journal, October 2011


SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
journal, December 2011

  • Liu, Kevin; Warnow, Tandy J.; Holder, Mark T.
  • Systematic Biology, Vol. 61, Issue 1
  • DOI: 10.1093/sysbio/syr095

Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010


Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees
journal, June 2009


A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
journal, March 2011


Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization
journal, December 2015

  • Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari
  • Molecular Biology and Evolution, Vol. 33, Issue 4
  • DOI: 10.1093/molbev/msv333

The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs
journal, January 2002

  • Cannone, Jamie J.; Subramanian, Sankar; Schnare, Murray N.
  • BMC Bioinformatics, Vol. 3, Issue 1
  • DOI: 10.1186/1471-2105-3-2

Pfam: The protein families database in 2021
journal, October 2020

  • Mistry, Jaina; Chuguransky, Sara; Williams, Lowri
  • Nucleic Acids Research, Vol. 49, Issue D1
  • DOI: 10.1093/nar/gkaa913

INDELible: A Flexible Simulator of Biological Sequence Evolution
journal, May 2009


Multiple sequence alignment for phylogenetic purposes
journal, January 2006

  • Morrison, David A.
  • Australian Systematic Botany, Vol. 19, Issue 6
  • DOI: 10.1071/SB06020