EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment
Abstract
Abstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. Results We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . Conclusions EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.
- Authors:
- Publication Date:
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 2229173
- Resource Type:
- Published Article
- Journal Name:
- Algorithms for Molecular Biology
- Additional Journal Information:
- Journal Name: Algorithms for Molecular Biology Journal Volume: 18 Journal Issue: 1; Journal ID: ISSN 1748-7188
- Publisher:
- Springer Science + Business Media
- Country of Publication:
- United Kingdom
- Language:
- English
Citation Formats
Shen, Chengze, Liu, Baqiao, Williams, Kelly P., and Warnow, Tandy. EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment. United Kingdom: N. p., 2023.
Web. doi:10.1186/s13015-023-00247-x.
Shen, Chengze, Liu, Baqiao, Williams, Kelly P., & Warnow, Tandy. EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment. United Kingdom. https://doi.org/10.1186/s13015-023-00247-x
Shen, Chengze, Liu, Baqiao, Williams, Kelly P., and Warnow, Tandy. Thu .
"EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment". United Kingdom. https://doi.org/10.1186/s13015-023-00247-x.
@article{osti_2229173,
title = {EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment},
author = {Shen, Chengze and Liu, Baqiao and Williams, Kelly P. and Warnow, Tandy},
abstractNote = {Abstract Background Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. Results We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . Conclusions EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.},
doi = {10.1186/s13015-023-00247-x},
journal = {Algorithms for Molecular Biology},
number = 1,
volume = 18,
place = {United Kingdom},
year = {Thu Dec 07 00:00:00 EST 2023},
month = {Thu Dec 07 00:00:00 EST 2023}
}
https://doi.org/10.1186/s13015-023-00247-x
Works referenced in this record:
WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment
journal, August 2022
- Shen, Chengze; Park, Minhyuk; Warnow, Tandy
- Journal of Computational Biology, Vol. 29, Issue 8
Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
journal, July 2016
- Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka
- Bioinformatics, Vol. 32, Issue 21
Bridging the gap in RNA structure prediction
journal, April 2007
- Shapiro, Bruce A.; Yingling, Yaroslava G.; Kasprzak, Wojciech
- Current Opinion in Structural Biology, Vol. 17, Issue 2
MAGUS: Multiple sequence Alignment using Graph clUStering
journal, November 2020
- Smirnov, Vladimir; Warnow, Tandy
- Bioinformatics, Vol. 37, Issue 12
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences
journal, May 2015
- Mirarab, Siavash; Nguyen, Nam; Guo, Sheng
- Journal of Computational Biology, Vol. 22, Issue 5
Mutual Information in Protein Multiple Sequence Alignments Reveals Two Classes of Coevolving Positions †
journal, May 2005
- Gloor, Gregory B.; Martin, Louise C.; Wahl, Lindi M.
- Biochemistry, Vol. 44, Issue 19
UPP2: fast and accurate alignment of datasets with fragmentary sequences
journal, January 2023
- Park, Minhyuk; Ivanovic, Stefan; Chu, Gillian
- Bioinformatics, Vol. 39, Issue 1
HMMerge: an ensemble method for multiple sequence alignment
journal, January 2023
- Park, Minhyuk; Warnow, Tandy
- Bioinformatics Advances, Vol. 3, Issue 1
Adding unaligned sequences into an existing alignment using MAFFT and LAST
journal, September 2012
- Katoh, Kazutaka; Frith, Martin C.
- Bioinformatics, Vol. 28, Issue 23
MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences
journal, November 2021
- Shen, Chengze; Zaharias, Paul; Warnow, Tandy
- Bioinformatics, Vol. 38, Issue 4
Rose: generating sequence families
journal, March 1998
- Stoye, J.; Evers, D.; Meyer, F.
- Bioinformatics, Vol. 14, Issue 2
PASTA for proteins
journal, June 2018
- Collins, Kodi; Warnow, Tandy
- Bioinformatics, Vol. 34, Issue 22
Ultra-large alignments using phylogeny-aware profiles
journal, June 2015
- Nguyen, Nam-phuong D.; Mirarab, Siavash; Kumar, Keerthana
- Genome Biology, Vol. 16, Issue 1
Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega
journal, January 2011
- Sievers, Fabian; Wilm, Andreas; Dineen, David
- Molecular Systems Biology, Vol. 7, Issue 1
Highly accurate protein structure prediction with AlphaFold
journal, July 2021
- Jumper, John; Evans, Richard; Pritzel, Alexander
- Nature
WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
journal, January 2023
- Liu, Baqiao; Warnow, Tandy
- Bioinformatics Advances, Vol. 3, Issue 1
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
journal, March 2010
- Price, Morgan N.; Dehal, Paramvir S.; Arkin, Adam P.
- PLoS ONE, Vol. 5, Issue 3
FAST SP: linear time calculation of alignment accuracy
journal, October 2011
- Mirarab, Siavash; Warnow, Tandy
- Bioinformatics, Vol. 27, Issue 23
SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees
journal, December 2011
- Liu, Kevin; Warnow, Tandy J.; Holder, Mark T.
- Systematic Biology, Vol. 61, Issue 1
Prodigal: prokaryotic gene recognition and translation initiation site identification
journal, March 2010
- Hyatt, Doug; Chen, Gwo-Liang; LoCascio, Philip F.
- BMC Bioinformatics, Vol. 11, Issue 1
Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees
journal, June 2009
- Liu, K.; Raghavan, S.; Nelesen, S.
- Science, Vol. 324, Issue 5934
A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives
journal, March 2011
- Thompson, Julie D.; Linard, Benjamin; Lecompte, Odile
- PLoS ONE, Vol. 6, Issue 3
Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization
journal, December 2015
- Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari
- Molecular Biology and Evolution, Vol. 33, Issue 4
The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs
journal, January 2002
- Cannone, Jamie J.; Subramanian, Sankar; Schnare, Murray N.
- BMC Bioinformatics, Vol. 3, Issue 1
Pfam: The protein families database in 2021
journal, October 2020
- Mistry, Jaina; Chuguransky, Sara; Williams, Lowri
- Nucleic Acids Research, Vol. 49, Issue D1
INDELible: A Flexible Simulator of Biological Sequence Evolution
journal, May 2009
- Fletcher, W.; Yang, Z.
- Molecular Biology and Evolution, Vol. 26, Issue 8
Multiple sequence alignment for phylogenetic purposes
journal, January 2006
- Morrison, David A.
- Australian Systematic Botany, Vol. 19, Issue 6