DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Benchmarking of alignment-free sequence comparison methods

Abstract

Background: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Results: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. Conclusion: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions

Authors:
 [1];  [2];  [3];  [4];  [5];  [4];  [4];  [4];  [6];  [7];  [8];  [6];  [9];  [10];  [11];  [2];  [7];  [4]; ORCiD logo [1]
  1. Adam Mickiewicz Univ., Poznan (Poland). Faculty of Biology. Dept. of Computational Biology
  2. Univ. of Tulsa, Tulsa, OK (United States). Tandy School of Computer Science
  3. Sorbonne Univ., Paris (France)
  4. Gottingen Univ. (Germany). Inst. of Microbiology and Genetics. Dept. of Bioinformatics
  5. Univ. of Southern California, Los Angeles, CA (United States). Quantitative and Computational Biology Program. Dept. of Biological Sciences
  6. Univ. of California, Berkeley, CA (United States). Dept. of Chemistry; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Molecular Biophysics & Integrated Bioimaging Division
  7. Univ. of Southern California, Los Angeles, CA (United States). Quantitative and Computational Biology Program. Dept. of Biological Sciences; Fudan Univ., Shanghai (China). School of Mathematical Sciences. Centre for Computational Systems Biology
  8. Univ. of Padua (Italy). Dept. of Information Engineering
  9. Univ. of Lisbon (Portugal). Inst. Superior Tecnico. INESC-ID. IDMEC
  10. National Inst. of Health (NIH), Bethesda, MD (United States). National Cancer Inst. Division of Cancer Epidemiology and Genetics (DCEG)
  11. Univ. of Queensland, Brisbane, QLD (Australia). School of Chemistry and Molecular Biosciences. Inst. for Molecular Bioscience
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1626947
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Genome Biology (Online)
Additional Journal Information:
Journal Name: Genome Biology (Online); Journal Volume: 20; Journal Issue: 1; Journal ID: ISSN 1474-760X
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Biotechnology & Applied Microbiology; Genetics & Heredity; Alignment-free; Sequence comparison; Benchmark; Whole-genome phylogeny; Horizontal gene transfer; Web service

Citation Formats

Zielezinski, Andrzej, Girgis, Hani Z., Bernard, Guillaume, Leimeister, Chris-Andre, Tang, Kujin, Dencker, Thomas, Lau, Anna Katharina, Röhling, Sophie, Choi, Jae Jin, Waterman, Michael S., Comin, Matteo, Kim, Sung-Hou, Vinga, Susana, Almeida, Jonas S., Chan, Cheong Xin, James, Benjamin T., Sun, Fengzhu, Morgenstern, Burkhard, and Karlowski, Wojciech M. Benchmarking of alignment-free sequence comparison methods. United States: N. p., 2019. Web. doi:10.1186/s13059-019-1755-7.
Zielezinski, Andrzej, Girgis, Hani Z., Bernard, Guillaume, Leimeister, Chris-Andre, Tang, Kujin, Dencker, Thomas, Lau, Anna Katharina, Röhling, Sophie, Choi, Jae Jin, Waterman, Michael S., Comin, Matteo, Kim, Sung-Hou, Vinga, Susana, Almeida, Jonas S., Chan, Cheong Xin, James, Benjamin T., Sun, Fengzhu, Morgenstern, Burkhard, & Karlowski, Wojciech M. Benchmarking of alignment-free sequence comparison methods. United States. https://doi.org/10.1186/s13059-019-1755-7
Zielezinski, Andrzej, Girgis, Hani Z., Bernard, Guillaume, Leimeister, Chris-Andre, Tang, Kujin, Dencker, Thomas, Lau, Anna Katharina, Röhling, Sophie, Choi, Jae Jin, Waterman, Michael S., Comin, Matteo, Kim, Sung-Hou, Vinga, Susana, Almeida, Jonas S., Chan, Cheong Xin, James, Benjamin T., Sun, Fengzhu, Morgenstern, Burkhard, and Karlowski, Wojciech M. Thu . "Benchmarking of alignment-free sequence comparison methods". United States. https://doi.org/10.1186/s13059-019-1755-7. https://www.osti.gov/servlets/purl/1626947.
@article{osti_1626947,
title = {Benchmarking of alignment-free sequence comparison methods},
author = {Zielezinski, Andrzej and Girgis, Hani Z. and Bernard, Guillaume and Leimeister, Chris-Andre and Tang, Kujin and Dencker, Thomas and Lau, Anna Katharina and Röhling, Sophie and Choi, Jae Jin and Waterman, Michael S. and Comin, Matteo and Kim, Sung-Hou and Vinga, Susana and Almeida, Jonas S. and Chan, Cheong Xin and James, Benjamin T. and Sun, Fengzhu and Morgenstern, Burkhard and Karlowski, Wojciech M.},
abstractNote = {Background: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Results: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. Conclusion: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions},
doi = {10.1186/s13059-019-1755-7},
journal = {Genome Biology (Online)},
number = 1,
volume = 20,
place = {United States},
year = {2019},
month = {7}
}

Works referenced in this record:

Within-species lateral genetic transfer and the evolution of transcriptional regulation in Escherichia coli and Shigella
journal, October 2011


kmacs: the k -mismatch average common substring approach to alignment-free sequence comparison
journal, May 2014


Fast alignment-free sequence comparison using spaced-word frequencies
journal, April 2014


Comparison of phylogenetic trees
journal, February 1981


A greedy alignment-free distance estimator for phylogenetic inference
journal, June 2017

  • Thankachan, Sharma V.; Chockalingam, Sriram P.; Liu, Yongchao
  • BMC Bioinformatics, Vol. 18, Issue S8
  • DOI: 10.1186/s12859-017-1658-0

The ASTRAL Compendium in 2004
journal, January 2004


CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
journal, January 1994

  • Thompson, Julie D.; Higgins, Desmond G.; Gibson, Toby J.
  • Nucleic Acids Research, Vol. 22, Issue 22, p. 4673-4680
  • DOI: 10.1093/nar/22.22.4673

A statistical method for alignment-free comparison of regulatory sequences
journal, July 2007


Alignathon: a competitive assessment of whole-genome alignment methods
journal, October 2014


An improved model for whole genome phylogenetic analysis by Fourier transform
journal, October 2015


kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity
journal, September 2017


A Protein Map and Its Application
journal, May 2008

  • Yau, Stephen S. -T.; Yu, Chenglong; He, Rong
  • DNA and Cell Biology, Vol. 27, Issue 5
  • DOI: 10.1089/dna.2007.0676

A measure of the similarity of sets of sequences not requiring sequence alignment.
journal, July 1986


Skmer: assembly-free and alignment-free sample identification using genome skims
journal, February 2019

  • Sarmashghi, Shahab; Bohmann, Kristine; P. Gilbert, M. Thomas
  • Genome Biology, Vol. 20, Issue 1
  • DOI: 10.1186/s13059-019-1632-4

Analysis of genomic sequences by Chaos Game Representation
journal, May 2001


Rapid similarity search of proteins using alignments of domain arrangements
journal, July 2013


Co-phylog: an assembly-free phylogenomic approach for closely related organisms
journal, January 2013


Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
journal, December 2018


SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
journal, December 2013

  • Fox, Naomi K.; Brenner, Steven E.; Chandonia, John-Marc
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1240

A genome Tree of Life for the Fungi kingdom
journal, August 2017

  • Choi, JaeJin; Kim, Sung-Hou
  • Proceedings of the National Academy of Sciences, Vol. 114, Issue 35
  • DOI: 10.1073/pnas.1711939114

Patternhunter ii: Highly Sensitive and fast Homology Search
journal, September 2004

  • Li, Ming; Ma, Bin; Kisman, Derek
  • Journal of Bioinformatics and Computational Biology, Vol. 02, Issue 03
  • DOI: 10.1142/S0219720004000661

Highways of gene sharing in prokaryotes
journal, September 2005

  • Beiko, R. G.; Harlow, T. J.; Ragan, M. A.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 40
  • DOI: 10.1073/pnas.0504068102

A Phylogenetic Analysis of the Brassicales Clade Based on an Alignment-Free Sequence Comparison Method
journal, January 2012


Mash: fast genome and metagenome distance estimation using MinHash
journal, June 2016


kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome: Table 1
journal, April 2015


Genome-scale approaches to resolving incongruence in molecular phylogenies
journal, October 2003

  • Rokas, Antonis; Williams, Barry L.; King, Nicole
  • Nature, Vol. 425, Issue 6960
  • DOI: 10.1038/nature02053

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
journal, January 2014


New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing
journal, September 2013

  • Song, K.; Ren, J.; Reinert, G.
  • Briefings in Bioinformatics, Vol. 15, Issue 3
  • DOI: 10.1093/bib/bbt067

Information theory applications for biological sequence analysis
journal, September 2013


Dynamics of Genome Rearrangement in Bacterial Populations
journal, July 2008


Basic local alignment search tool
journal, October 1990

  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

Estimating evolutionary distances between genomic sequences from spaced-word matches
journal, February 2015

  • Morgenstern, Burkhard; Zhu, Bingyao; Horwege, Sebastian
  • Algorithms for Molecular Biology, Vol. 10, Issue 1
  • DOI: 10.1186/s13015-015-0032-x

Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer
journal, July 2016

  • Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A.
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep28970

Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)
journal, May 2011

  • Sims, G. E.; Kim, S. -H.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue 20
  • DOI: 10.1073/pnas.1105168108

Chaos game representation of gene structure
journal, January 1990


Entropic Profiler – detection of conservation in genomes using information theory
journal, January 2009

  • Fernandes, Francisco; Freitas, Ana T.; Almeida, Jonas S.
  • BMC Research Notes, Vol. 2, Issue 1
  • DOI: 10.1186/1756-0500-2-72

Alignment-Free Sequence Analysis and Applications
journal, July 2018


Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis
journal, July 2013

  • Bonham-Carter, O.; Steele, J.; Bastola, D.
  • Briefings in Bioinformatics, Vol. 15, Issue 6
  • DOI: 10.1093/bib/bbt052

Alignment free comparison: Similarity distribution between the DNA primary sequences based on the shortest absent word
journal, February 2012


Comparison of Undirected Phylogenetic Trees Based on Subtrees of Four Evolutionary Units
journal, June 1985

  • Estabrook, George F.; McMorris, F. R.; Meacham, Christopher A.
  • Systematic Zoology, Vol. 34, Issue 2
  • DOI: 10.2307/2413326

An information-based sequence distance and its application to whole mitochondrial genome phylogeny
journal, February 2001


Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs
journal, January 2008


Quartet MaxCut: A fast algorithm for amalgamating quartet trees
journal, January 2012


Alignment-Free Sequence Comparison (I): Statistics and Power
journal, December 2009

  • Reinert, Gesine; Chew, David; Sun, Fengzhu
  • Journal of Computational Biology, Vol. 16, Issue 12
  • DOI: 10.1089/cmb.2009.0198

The Average Common Substring Approach to Phylogenomic Reconstruction
journal, March 2006

  • Ulitsky, Igor; Burstein, David; Tuller, Tamir
  • Journal of Computational Biology, Vol. 13, Issue 2
  • DOI: 10.1089/cmb.2006.13.336

Getting a better picture of microbial evolution en route to a network of genomes
journal, August 2009

  • Dagan, Tal; Martin, William
  • Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 364, Issue 1527
  • DOI: 10.1098/rstb.2009.0040

Classification of methanogenic bacteria by 16S ribosomal RNA characterization
journal, October 1977

  • Fox, G. E.; Magrum, L. J.; Balch, W. E.
  • Proceedings of the National Academy of Sciences, Vol. 74, Issue 10
  • DOI: 10.1073/pnas.74.10.4537

andi: Fast and accurate estimation of evolutionary distances between closely related genomes
journal, December 2014


Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution
journal, December 2009

  • Jun, S. -R.; Sims, G. E.; Wu, G. A.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 1
  • DOI: 10.1073/pnas.0913033107

On the comparison of regulatory sequences with multiple resolution Entropic Profiles
journal, March 2016


Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software
journal, October 2017

  • Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter
  • Nature Methods, Vol. 14, Issue 11
  • DOI: 10.1038/nmeth.4458

Comparative evaluation of word composition distances for the recognition of SCOP relationships
journal, January 2004


Alignment-free phylogeny of whole genomes using underlying subwords
journal, December 2012

  • Comin, Matteo; Verzotto, Davide
  • Algorithms for Molecular Biology, Vol. 7, Issue 1
  • DOI: 10.1186/1748-7188-7-34

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
journal, July 2016

  • Cong, Yingnan; Chan, Yao-ban; Ragan, Mark A.
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep30308

A survey and evaluations of histogram-based statistics in alignment-free sequence comparison
journal, December 2017

  • Luczak, Brian B.; James, Benjamin T.; Girgis, Hani Z.
  • Briefings in Bioinformatics, Vol. 20, Issue 4
  • DOI: 10.1093/bib/bbx161

PatternHunter: faster and more sensitive homology search
journal, March 2002


Alignment-free phylogenetics and population genetics
journal, November 2013


The ASTRAL compendium for protein structure and sequence analysis
journal, January 2000


CAFE: aCcelerated Alignment-FrEe sequence analysis
journal, May 2017

  • Lu, Yang Young; Tang, Kujin; Ren, Jie
  • Nucleic Acids Research, Vol. 45, Issue W1
  • DOI: 10.1093/nar/gkx351

LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification
journal, May 2018


A Measure of DNA Sequence Dissimilarity Based on Mahalanobis Distance between Frequencies of Words
journal, December 1997

  • Wu, Tiee-Jian; Burke, John P.; Davison, Daniel B.
  • Biometrics, Vol. 53, Issue 4
  • DOI: 10.2307/2533509

Pattern pluralism and the Tree of Life hypothesis
journal, January 2007

  • Doolittle, W. F.; Bapteste, E.
  • Proceedings of the National Academy of Sciences, Vol. 104, Issue 7
  • DOI: 10.1073/pnas.0610699104

K 2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics
journal, December 2017


Simulation-based comprehensive benchmarking of RNA-seq aligners
journal, December 2016

  • Baruzzo, Giacomo; Hayer, Katharina E.; Kim, Eun Ji
  • Nature Methods, Vol. 14, Issue 2
  • DOI: 10.1038/nmeth.4106

A simulation test bed for hypotheses of genome evolution
journal, January 2007


ART: a next-generation sequencing read simulator
journal, December 2011


Alignment-free sequence comparison: benefits, applications, and tools
journal, October 2017


Sequence analysis by iterated maps, a review
journal, October 2013


ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data
journal, February 2016

  • Huerta-Cepas, Jaime; Serra, François; Bork, Peer
  • Molecular Biology and Evolution, Vol. 33, Issue 6
  • DOI: 10.1093/molbev/msw046

Next-generation phylogenomics
journal, January 2013


Assemblathon 1: A competitive assessment of de novo short read assembly methods
journal, September 2011


Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
journal, May 2014

  • Horwege, Sebastian; Lindner, Sebastian; Boden, Marcus
  • Nucleic Acids Research, Vol. 42, Issue W1
  • DOI: 10.1093/nar/gku398

Alignment-free sequence comparison--a review
journal, March 2003


Biological Evaluation of d 2 , an Algorithm for High-Performance Sequence Comparison
journal, January 1994

  • Hide, Winston; Burke, John; Da Vison, Daniel B.
  • Journal of Computational Biology, Vol. 1, Issue 3
  • DOI: 10.1089/cmb.1994.1.199

Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison
journal, August 2008


Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions
journal, February 2009

  • Sims, Gregory E.; Jun, Se-Ran; Wu, Guohong A.
  • Proceedings of the National Academy of Sciences, Vol. 106, Issue 8
  • DOI: 10.1073/pnas.0813249106

Practical Performance of Tree Comparison Metrics
journal, December 2014


Alignment-free distance measure based on return time distribution for sequence analysis: Applications to clustering, molecular phylogeny and subtyping
journal, November 2012

  • Kolekar, Pandurang; Kale, Mohan; Kulkarni-Kale, Urmila
  • Molecular Phylogenetics and Evolution, Vol. 65, Issue 2
  • DOI: 10.1016/j.ympev.2012.07.003

k -mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank
journal, November 2018


Alignment-free inference of hierarchical and reticulate phylogenomic relationships
journal, June 2017

  • Bernard, Guillaume; Chan, Cheong Xin; Chan, Yao-ban
  • Briefings in Bioinformatics, Vol. 20, Issue 2
  • DOI: 10.1093/bib/bbx067

Estimating Mutation Distances from Unaligned Genomes
journal, October 2009

  • Haubold, Bernhard; Pfaffelhuber, Peter; Domazet-Los˘o, Mirjana
  • Journal of Computational Biology, Vol. 16, Issue 10
  • DOI: 10.1089/cmb.2009.0106

Inferring phylogenies of evolving sequences without multiple sequence alignment
journal, September 2014

  • Chan, Cheong Xin; Bernard, Guillaume; Poirion, Olivier
  • Scientific Reports, Vol. 4, Issue 1
  • DOI: 10.1038/srep06504

Standardized benchmarking in the quest for orthologs
journal, April 2016

  • Altenhoff, Adrian M.; Boeckmann, Brigitte; Capella-Gutierrez, Salvador
  • Nature Methods, Vol. 13, Issue 5
  • DOI: 10.1038/nmeth.3830

Divergence measures based on the Shannon entropy
journal, January 1991

  • Lin, J.
  • IEEE Transactions on Information Theory, Vol. 37, Issue 1
  • DOI: 10.1109/18.61115

Recapitulating phylogenies using k-mers: from trees to networks
journal, January 2016


EMBOSS: The European Molecular Biology Open Software Suite
journal, June 2000


Alignment-Free Sequence Comparison (II): Theoretical Power of Comparison Statistics
journal, November 2010

  • Wan, Lin; Reinert, Gesine; Sun, Fengzhu
  • Journal of Computational Biology, Vol. 17, Issue 11
  • DOI: 10.1089/cmb.2010.0056

Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets
journal, February 2019


An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
journal, July 2015


An estimator for local analysis of genome based on the minimal absent word
journal, April 2016


NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010

  • Valiev, M.; Bylaska, E. J.; Govind, N.
  • Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489
  • DOI: 10.1016/j.cpc.2010.04.018

‘Multi-SpaM’: a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees
journal, October 2019

  • Dencker, Thomas; Leimeister, Chris-André; Gerth, Michael
  • NAR Genomics and Bioinformatics, Vol. 2, Issue 1
  • DOI: 10.1093/nargab/lqz013

Inferring parsimonious migration histories for metastatic cancers
journal, April 2018


A synthetic energy dataset for non-intrusive load monitoring in households
journal, April 2020


Energy efficiency and biological interactions define the core microbiome of deep oligotrophic groundwater
preprint, May 2020


Comparison of Undirected Phylogenetic Trees Based on Subtrees of Four Evolutionary Units
journal, June 1985

  • Estabrook, G. F.; McMorris, F. R.; Meacham, C. A.
  • Systematic Biology, Vol. 34, Issue 2
  • DOI: 10.2307/sysbio/34.2.193

Fast and accurate phylogeny reconstruction using filtered spaced-word matches
audiovisual, January 2018

  • Morgenstern, Burkhard
  • Georg-August-Universität Göttingen
  • DOI: 10.5446/42536

Recapitulating phylogenies using k-mers: from trees to networks
journal, January 2016


Standardized benchmarking in the quest for orthologs
text, January 2016


Critical Assessment of Metagenome Interpretation - A benchmark of metagenomics software
text, January 2017


Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
text, January 2013


An improved model for whole genome phylogenetic analysis by Fourier transform
journal, October 2015


An estimator for local analysis of genome based on the minimal absent word
journal, April 2016


Quartet MaxCut: A fast algorithm for amalgamating quartet trees
journal, January 2012


Alignment-free distance measure based on return time distribution for sequence analysis: Applications to clustering, molecular phylogeny and subtyping
journal, November 2012

  • Kolekar, Pandurang; Kale, Mohan; Kulkarni-Kale, Urmila
  • Molecular Phylogenetics and Evolution, Vol. 65, Issue 2
  • DOI: 10.1016/j.ympev.2012.07.003

Genome-scale approaches to resolving incongruence in molecular phylogenies
journal, October 2003

  • Rokas, Antonis; Williams, Barry L.; King, Nicole
  • Nature, Vol. 425, Issue 6960
  • DOI: 10.1038/nature02053

Simulation-based comprehensive benchmarking of RNA-seq aligners
journal, December 2016

  • Baruzzo, Giacomo; Hayer, Katharina E.; Kim, Eun Ji
  • Nature Methods, Vol. 14, Issue 2
  • DOI: 10.1038/nmeth.4106

Mating pair stabilization mediates bacterial conjugation species specificity
journal, June 2022


A New Phylogenomic Approach For Quantifying Horizontal Gene Transfer Trends in Prokaryotes
journal, July 2020


Inferring phylogenies of evolving sequences without multiple sequence alignment
journal, September 2014

  • Chan, Cheong Xin; Bernard, Guillaume; Poirion, Olivier
  • Scientific Reports, Vol. 4, Issue 1
  • DOI: 10.1038/srep06504

Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer
journal, July 2016

  • Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A.
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep28970

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
journal, July 2016

  • Cong, Yingnan; Chan, Yao-ban; Ragan, Mark A.
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep30308

Highways of gene sharing in prokaryotes
journal, September 2005

  • Beiko, R. G.; Harlow, T. J.; Ragan, M. A.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 40
  • DOI: 10.1073/pnas.0504068102

Pattern pluralism and the Tree of Life hypothesis
journal, January 2007

  • Doolittle, W. F.; Bapteste, E.
  • Proceedings of the National Academy of Sciences, Vol. 104, Issue 7
  • DOI: 10.1073/pnas.0610699104

Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution
journal, December 2009

  • Jun, S. -R.; Sims, G. E.; Wu, G. A.
  • Proceedings of the National Academy of Sciences, Vol. 107, Issue 1
  • DOI: 10.1073/pnas.0913033107

Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)
journal, May 2011

  • Sims, G. E.; Kim, S. -H.
  • Proceedings of the National Academy of Sciences, Vol. 108, Issue 20
  • DOI: 10.1073/pnas.1105168108

A measure of the similarity of sets of sequences not requiring sequence alignment.
journal, July 1986


Biological Evaluation of d 2 , an Algorithm for High-Performance Sequence Comparison
journal, January 1994

  • Hide, Winston; Burke, John; Da Vison, Daniel B.
  • Journal of Computational Biology, Vol. 1, Issue 3
  • DOI: 10.1089/cmb.1994.1.199

The Average Common Substring Approach to Phylogenomic Reconstruction
journal, March 2006

  • Ulitsky, Igor; Burstein, David; Tuller, Tamir
  • Journal of Computational Biology, Vol. 13, Issue 2
  • DOI: 10.1089/cmb.2006.13.336

Alignment-Free Sequence Comparison (II): Theoretical Power of Comparison Statistics
journal, November 2010

  • Wan, Lin; Reinert, Gesine; Sun, Fengzhu
  • Journal of Computational Biology, Vol. 17, Issue 11
  • DOI: 10.1089/cmb.2010.0056

A Protein Map and Its Application
journal, May 2008

  • Yau, Stephen S. -T.; Yu, Chenglong; He, Rong
  • DNA and Cell Biology, Vol. 27, Issue 5
  • DOI: 10.1089/dna.2007.0676

Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis
journal, July 2013

  • Bonham-Carter, O.; Steele, J.; Bastola, D.
  • Briefings in Bioinformatics, Vol. 15, Issue 6
  • DOI: 10.1093/bib/bbt052

Information theory applications for biological sequence analysis
journal, September 2013


Sequence analysis by iterated maps, a review
journal, October 2013


Alignment-free phylogenetics and population genetics
journal, November 2013


Alignment-free inference of hierarchical and reticulate phylogenomic relationships
journal, June 2017

  • Bernard, Guillaume; Chan, Cheong Xin; Chan, Yao-ban
  • Briefings in Bioinformatics, Vol. 20, Issue 2
  • DOI: 10.1093/bib/bbx067

An information-based sequence distance and its application to whole mitochondrial genome phylogeny
journal, February 2001


Analysis of genomic sequences by Chaos Game Representation
journal, May 2001


PatternHunter: faster and more sensitive homology search
journal, March 2002


Alignment-free sequence comparison--a review
journal, March 2003


Comparative evaluation of word composition distances for the recognition of SCOP relationships
journal, January 2004


A simulation test bed for hypotheses of genome evolution
journal, January 2007


Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison
journal, August 2008


Rapid similarity search of proteins using alignments of domain arrangements
journal, July 2013


Fast alignment-free sequence comparison using spaced-word frequencies
journal, April 2014


kmacs: the k -mismatch average common substring approach to alignment-free sequence comparison
journal, May 2014


kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome: Table 1
journal, April 2015


Fast and accurate phylogeny reconstruction using filtered spaced-word matches
journal, January 2017


K 2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics
journal, December 2017


LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification
journal, May 2018


ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data
journal, February 2016

  • Huerta-Cepas, Jaime; Serra, François; Bork, Peer
  • Molecular Biology and Evolution, Vol. 33, Issue 6
  • DOI: 10.1093/molbev/msw046

Chaos game representation of gene structure
journal, January 1990


CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
journal, January 1994

  • Thompson, Julie D.; Higgins, Desmond G.; Gibson, Toby J.
  • Nucleic Acids Research, Vol. 22, Issue 22, p. 4673-4680
  • DOI: 10.1093/nar/22.22.4673

The ASTRAL compendium for protein structure and sequence analysis
journal, January 2000


The ASTRAL Compendium in 2004
journal, January 2004


CAFE: aCcelerated Alignment-FrEe sequence analysis
journal, May 2017

  • Lu, Yang Young; Tang, Kujin; Ren, Jie
  • Nucleic Acids Research, Vol. 45, Issue W1
  • DOI: 10.1093/nar/gkx351

Practical Performance of Tree Comparison Metrics
journal, December 2014


Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets
journal, February 2019


Getting a better picture of microbial evolution en route to a network of genomes
journal, August 2009

  • Dagan, Tal; Martin, William
  • Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 364, Issue 1527
  • DOI: 10.1098/rstb.2009.0040

Assemblathon 1: A competitive assessment of de novo short read assembly methods
journal, September 2011


Alignathon: a competitive assessment of whole-genome alignment methods
journal, October 2014


Fast Entropic Profiler: An Information Theoretic Approach for the Discovery of Patterns in Genomes
journal, May 2014

  • Comin, Matteo; Antonello, Morris
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 11, Issue 3
  • DOI: 10.1109/tcbb.2013.2297924

Alignment-Free Sequence Analysis and Applications
journal, July 2018


Within-species lateral genetic transfer and the evolution of transcriptional regulation in Escherichia coli and Shigella
journal, October 2011


Alignment-free phylogeny of whole genomes using underlying subwords
journal, December 2012

  • Comin, Matteo; Verzotto, Davide
  • Algorithms for Molecular Biology, Vol. 7, Issue 1
  • DOI: 10.1186/1748-7188-7-34

Entropic Profiler – detection of conservation in genomes using information theory
journal, January 2009

  • Fernandes, Francisco; Freitas, Ana T.; Almeida, Jonas S.
  • BMC Research Notes, Vol. 2, Issue 1
  • DOI: 10.1186/1756-0500-2-72

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
journal, July 2013

  • Bradnam, Keith R.; Fass, Joseph N.; Alexandrov, Anton
  • GigaScience, Vol. 2, Issue 1
  • DOI: 10.1186/2047-217x-2-10

Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs
journal, January 2008


On the comparison of regulatory sequences with multiple resolution Entropic Profiles
journal, March 2016


A greedy alignment-free distance estimator for phylogenetic inference
journal, June 2017

  • Thankachan, Sharma V.; Chockalingam, Sriram P.; Liu, Yongchao
  • BMC Bioinformatics, Vol. 18, Issue S8
  • DOI: 10.1186/s12859-017-1658-0

An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
journal, July 2015


Mash: fast genome and metagenome distance estimation using MinHash
journal, June 2016


Recapitulating phylogenies using k-mers: from trees to networks
journal, January 2016


Dynamics of Genome Rearrangement in Bacterial Populations
journal, July 2008


A Measure of DNA Sequence Dissimilarity Based on Mahalanobis Distance between Frequencies of Words
journal, December 1997

  • Wu, Tiee-Jian; Burke, John P.; Davison, Daniel B.
  • Biometrics, Vol. 53, Issue 4
  • DOI: 10.2307/2533509

Comparison of Undirected Phylogenetic Trees Based on Subtrees of Four Evolutionary Units
journal, June 1985

  • Estabrook, G. F.; McMorris, F. R.; Meacham, C. A.
  • Systematic Biology, Vol. 34, Issue 2
  • DOI: 10.2307/sysbio/34.2.193

Linear-Time Algorithm for Long LCF with $k$ Mismatches
preprint, January 2018


Works referencing / citing this record:

Whole-proteome tree of life suggests a deep burst of organism diversity
journal, February 2020

  • Choi, JaeJin; Kim, Sung-Hou
  • Proceedings of the National Academy of Sciences, Vol. 117, Issue 7
  • DOI: 10.1073/pnas.1915766117

‘Multi-SpaM’: a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees
journal, October 2019

  • Dencker, Thomas; Leimeister, Chris-André; Gerth, Michael
  • NAR Genomics and Bioinformatics, Vol. 2, Issue 1
  • DOI: 10.1093/nargab/lqz013

Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage
journal, December 2019

  • Lau, Anna-Katharina; Dörrer, Svenja; Leimeister, Chris-André
  • BMC Bioinformatics, Vol. 20, Issue S20
  • DOI: 10.1186/s12859-019-3205-7

Unblended disjoint tree merging using GTM improves species tree estimation
journal, April 2020


Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression
journal, December 2019


Reads Binning Improves Alignment-Free Metagenome Comparison
journal, November 2019


Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements
journal, May 2020


Graph Theory-Based Sequence Descriptors as Remote Homology Predictors
journal, December 2019

  • Agüero-Chapin, Guillermin; Galpert, Deborah; Molina-Ruiz, Reinaldo
  • Biomolecules, Vol. 10, Issue 1
  • DOI: 10.3390/biom10010026

S-conLSH: alignment-free gapped mapping of noisy long reads
journal, February 2021

  • Chakraborty, Angana; Morgenstern, Burkhard; Bandyopadhyay, Sanghamitra
  • BMC Bioinformatics, Vol. 22, Issue 1
  • DOI: 10.1186/s12859-020-03918-3

Whole-proteome tree of life suggests a deep burst of organism diversity
journal, February 2020

  • Choi, JaeJin; Kim, Sung-Hou
  • Proceedings of the National Academy of Sciences, Vol. 117, Issue 7
  • DOI: 10.1073/pnas.1915766117

Benchmarking comes of age
journal, October 2019


Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression
journal, December 2019


KITSUNE: A Tool for Identifying Empirically Optimal K-mer Length for Alignment-Free Phylogenomic Analysis
journal, September 2020

  • Pornputtapong, Natapol; Acheampong, Daniel A.; Patumcharoenpol, Preecha
  • Frontiers in Bioengineering and Biotechnology, Vol. 8
  • DOI: 10.3389/fbioe.2020.556413

Reads Binning Improves Alignment-Free Metagenome Comparison
journal, November 2019


Graph Theory-Based Sequence Descriptors as Remote Homology Predictors
journal, December 2019

  • Agüero-Chapin, Guillermin; Galpert, Deborah; Molina-Ruiz, Reinaldo
  • Biomolecules, Vol. 10, Issue 1
  • DOI: 10.3390/biom10010026