skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Comparing sequences without using alignments: application to HIV/SIV subtyping

Journal Article · · BMC Bioinformatics
 [1];  [2];  [2];  [3];  [4];  [4];  [4]
  1. Unite Mixte de Recherche (UMR), Marseille Cedex (France). Centre National de la Recherche Scientifique (CNRS). Inst. Mathematique de Luminy
  2. Laboratoire d'Informatique Fondamentale de Lille (LIFL), Villeneuve d'Ascq (France). Equipe Bioinfo
  3. Gottingen Univ. (Germany). Inst. of Microbiology and Genetics. Dept. of Bioinformatics; Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Theoretical Biology and Biophysics Group
  4. Unite Mixte de Recherche (UMR), Evry (France). Centre National de la Recherche Scientifique (CNRS). Lab. Statistique et Genome

Background: In general, the construction of trees is based on sequence alignments. This procedure, however, leads to loss of informationwhen parts of sequence alignments (for instance ambiguous regions) are deleted before tree building. To overcome this difficulty, one of us previously introduced a new and rapid algorithm that calculates dissimilarity matrices between sequences without preliminary alignment. Results: In this paper, HIV (Human Immunodeficiency Virus) and SIV (Simian Immunodeficiency Virus) sequence data are used to evaluate this method. The program produces tree topologies that are identical to those obtained by a combination of standard methods detailed in the HIV Sequence Compendium. Manual alignment editing is not necessary at any stage. Furthermore, only one userspecified parameter is needed for constructing trees. Conclusion: The extensive tests on HIV/SIV subtyping showed that the virus classifications produced by our method are in good agreement with our best taxonomic knowledge, even in noncoding LTR (Long Terminal Repeat) regions that are not tractable by regular alignment methods due to frequent duplications/insertions/deletions. Our method, however, is not limited to the HIV/ SIV subtyping. It provides an alternative tree construction without a time-consuming aligning procedure.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1626333
Journal Information:
BMC Bioinformatics, Vol. 8, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (13)

Multiple sequence alignment using partial order graphs journal March 2002
Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems journal February 2004
A novel method for multiple alignment of sequences with repeated and shuffled elements journal November 2004
HIV-1 and HIV-2 LTR Nucleotide Sequences: Assessment of the Alignment by N-block Presentation, “Retroviral Signatures” of Overrepeated Oligonucleotides, and a Probable Important Role of Scrambled Stepwise Duplications/Deletions in Molecular Evolution journal July 2001
Comparisons of eukaryotic genomic sequences. journal December 1994
Local Decoding of Sequences and Alignment-Free Comparison journal October 2006
Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes journal February 1999
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice journal January 1994
DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment journal March 1999
Profile hidden Markov models journal October 1998
Confidence Limits on Phylogenies: An Approach Using the Bootstrap journal July 1985
Retroviral Oligonucleotide Distributions Correlate with Biased Nucleotide Compositions of Retrovirus Sequences, Suggesting a Duplicative Stepwise Molecular Evolution journal February 1997
Caractérisation des N-écritures et application à l'étude des suites de complexité ultimement n + cste journal February 1999

Cited By (119)

Phylogenetic Weighting Does Little to Improve the Accuracy of Evolutionary Coupling Analyses journal October 2019
Phylogenetic weighting does little to improve the accuracy of evolutionary coupling analyses posted_content January 2019
Differentiating Isobaric Steroid Hormone Metabolites Using Multi-Stage Tandem Mass Spectrometry journal January 2013
Fast and efficient dynamic nested effects models journal November 2010
Early response index: a statistic to discover potential early stage disease biomarkers journal June 2017
Heterogeneous radiotherapy dose-outcomes response in parotid glands journal June 2018
Fast alignment-free sequence comparison using spaced-word frequencies journal April 2014
Prediction of Liquid-Liquid Phase Separation Proteins Using Machine Learning journal January 2020
Iroki: automatic customization and visualization of phylogenetic trees journal September 2019
Software applications toward quantitative metabolic flux analysis and modeling journal November 2012
COMET: adaptive context-based modeling for ultrafast HIV-1 subtype identification journal August 2014
FGF signalling plays similar roles in development and regeneration of the skeleton in the brittle star Amphiura filiformis posted_content January 2019
Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling
  • Huang, Sitao; Manikandan, Gowthami Jayashri; Ramachandran, Anand
  • FPGA '17: The 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays https://doi.org/10.1145/3020078.3021749
conference February 2017
Regulatory context drives conservation of glycine riboswitch aptamers posted_content January 2019
Investigation of an LPA KIV-2 nonsense mutation in 11,000 individuals: the importance of linkage disequilibrium structure in LPA genetics posted_content January 2019
Meta-analysis of cell- specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes journal June 2017
An Efficient PHSW-DC Algorithm for Solving Motif Finding Problem in TP53 Cancer Gene book January 2018
Regulatory context drives conservation of glycine riboswitch aptamers journal December 2019
Towards peptide vaccines against Zika virus: Immunoinformatics combined with molecular dynamics simulations to predict antigenic epitopes of Zika viral proteins journal December 2016
Floating search methodology for combining classification models for site recognition in DNA sequences
  • Pérez-Rodríguez, Javier; de Haro-García, Aida; García-Pedrajas, Nicolás
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics https://doi.org/10.1101/320309
posted_content January 2018
Synthesis of geological and comparative phylogeographic data point to climate, not mountain uplift, as driver of divergence across the Eastern Andean Cordillera posted_content January 2020
Polymorphism analyses and protein modelling inform on functional specialization of Piwi clade genes in the arboviral vector Aedes albopictus posted_content January 2019
Automatic clustering based on density peak detection using generalized extreme value distribution journal August 2017
Genome and evolution of the shade-requiring medicinal herb Panax ginseng journal May 2018
densityCut: an efficient and versatile topological approach for automatic clustering of biological data journal April 2016
ABangle: characterising the VH-VL orientation in antibodies journal May 2013
Pangenome and immuno-proteomics analysis of Acinetobacter baumannii strains revealed the core peptide vaccine targets journal September 2016
Identification of Amyloidogenic Regions in the Spine of Insulin Fibrils journal January 2019
Floating search methodology for combining classification models for site recognition in DNA sequences journal January 2020
Distinctive characters of Nostoc genomes in cyanolichens journal June 2018
Letter to the Editor: Stability of Random Forest importance measures journal March 2010
DeepMood: Modeling Mobile Phone Typing Dynamics for Mood Detection
  • Cao, Bokai; Zheng, Lei; Zhang, Chenwei
  • KDD '17: The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining https://doi.org/10.1145/3097983.3098086
conference August 2017
Estimating evolutionary distances between genomic sequences from spaced-word matches journal February 2015
Rafts, Nanoparticles and Neural Disease journal August 2012
Polymorphism analyses and protein modelling inform on functional specialization of Piwi clade genes in the arboviral vector Aedes albopictus journal December 2019
FGF signalling plays similar roles in development and regeneration of the skeleton in the brittle star Amphiura filiformis journal May 2021
Robustness of eco-industrial symbiosis network: a case study of China journal July 2018
A genetic relationship between nitrogen use efficiency and seedling root traits in maize as revealed by QTL analysis journal April 2015
Exploring Leptospiral proteomes to identify potential candidates for vaccine design against Leptospirosis using an immunoinformatics approach journal May 2018
Genome-wide association analysis to identify chromosomal regions determining components of earliness in wheat journal November 2011
Editor’s Highlight: Identification of Any Structure-Specific Hepatotoxic Potential of Different Pyrrolizidine Alkaloids Using Random Forests and Artificial Neural Networks journal September 2017
Seasonal and geographical distribution of near-surface small photosynthetic eukaryotes in the western North Pacific determined by pyrosequencing of 18S rDNA journal November 2016
Identifying Genes in Published Pathway Figure Images posted_content January 2018
Balancing selection shapes the Intracellular Pathogen Response in natural Caenorhabditis elegans populations journal October 2021
Iroki: automatic customization and visualization of phylogenetic trees journal January 2020
DeepMood: Modeling Mobile Phone Typing Dynamics for Mood Detection text January 2018
Transcriptomics analysis of hulless barley during grain development with a focus on starch biosynthesis journal December 2016
OptFill: A Tool for Infeasible Cycle-Free Gapfilling of Stoichiometric Metabolic Models journal January 2020
The genome assembly of asparagus bean, Vigna unguiculata ssp. sesquipedialis journal July 2019
Alignment-free method for DNA sequence clustering using Fuzzy integral similarity journal March 2019
A Network of Splice Isoforms for the Mouse journal April 2016
Mass spectrometry analysis and transcriptome sequencing reveal glowing squid crystal proteins are in the same superfamily as firefly luciferase journal June 2016
Regulation of BZR1 in fruit ripening revealed by iTRAQ proteomics analysis journal September 2016
The pan-cancer pathological regulatory landscape journal December 2016
Population genomic insights into variation and evolution of Xanthomonas oryzae pv. oryzae journal January 2017
Erratum: Towards peptide vaccines against Zika virus: Immunoinformatics combined with molecular dynamics simulations to predict antigenic epitopes of Zika viral proteins journal April 2017
Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes: a random forest regression approach journal June 2017
Sixty-five years of the long march in protein secondary structure prediction: the final stretch? journal December 2016
TRStalker: an efficient heuristic for finding fuzzy tandem repeats journal June 2010
B2G-FAR, a species-centered GO annotation repository journal February 2011
Imputing gene expression to maximize platform compatibility journal November 2016
Fast and accurate phylogeny reconstruction using filtered spaced-word matches journal January 2017
Surrogate minimal depth as an importance measure for variables in random forests journal March 2019
Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD) journal January 2020
Genome-scale modeling of yeast: chronology, applications and critical perspectives journal July 2017
VAPPER: High-throughput variant antigen profiling in African trypanosomes of livestock journal August 2019
Sporadic ALS has compartment-specific aberrant exon splicing and altered cell–matrix adhesion biology journal November 2009
Association study of wheat grain protein composition reveals that gliadin and glutenin composition are trans-regulated by different chromosome regions journal July 2013
HLA Heterozygote Advantage against HIV-1 Is Driven by Quantitative and Qualitative Differences in HLA Allele-Specific Peptide Presentation journal October 2019
Fail-safe mechanism of GCN4 translational control—uORF2 promotes reinitiation by analogous mechanism to uORF1 and thus secures its key role in GCN4 expression journal March 2014
Transcriptome dynamics of the microRNA inhibition response journal February 2016
PepComposer: computational design of peptides binding to a given protein surface journal April 2016
RGBM: regularized gradient boosting machines for identification of the transcriptional regulators of discrete glioma subtypes journal January 2018
Viral outbreaks involve destabilized evolutionary networks: evidence from Ebola, Influenza and Zika posted_content September 2017
Validating Regulatory Predictions from Diverse Bacteria with Mutant Fitness Data journal December 2016
Combining Dissimilarities in a Hyper Reproducing Kernel Hilbert Space for Complex Human Cancer Prediction conference January 2008
multi‐dice : r package for comparative population genomic inference under hierarchical co‐demographic models of independent single‐population size changes journal May 2017
Barley Genomics: An Overview journal March 2008
Screening for PPAR Responsive Regulatory Modules in Cancer journal January 2008
How to Improve Postgenomic Knowledge Discovery Using Imputation journal January 2009
MS4 - Multi-Scale Selector of Sequence Signatures: An alignment-free method for classification of biological sequences journal July 2010
From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification journal January 2010
Efficient protein alignment algorithm for protein search journal January 2010
Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology journal December 2010
A methodology to assess the intrinsic discriminative ability of a distance function and its interplay with clustering algorithms for microarray data analysis journal January 2013
Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides journal February 2008
A Bayesian method for calculating real-time quantitative PCR calibration curves using absolute plasmid DNA standards journal February 2008
SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics journal January 2008
A Bayesian Network View on Nested Effects Models journal January 2009
Modelling Transcriptional Regulation with a Mixture of Factor Analyzers and Variational Bayesian Expectation Maximization journal January 2009
Improving integrative searching of systems chemical biology data using semantic annotation journal March 2012
Stepwise approach for combining many sources of evidence for site-recognition in genomic sequences journal March 2016
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests journal March 2017
Gsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information journal February 2019
MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture journal July 2019
Identifying protein complexes based on an edge weight algorithm and core-attachment structure journal September 2019
Estimating linkage disequilibrium from genotypes under Hardy-Weinberg equilibrium journal February 2020
Effect of low complexity regions within the PvMSP3α block II on the tertiary structure of the protein and implications to immune escape mechanisms journal March 2019
Characterization of putative proteins encoded by variable ORFs in white spot syndrome virus genome journal April 2019
Medicinal value of asiaticoside for Alzheimer’s disease as assessed using single-molecule-detection fluorescence correlation spectroscopy, laser-scanning microscopy, transmission electron microscopy, and in silico docking journal April 2015
An efficient algorithm for identifying primary phenotype attractors of a large-scale Boolean network journal October 2016
Detecting complexes from edge-weighted PPI networks via genes expression analysis journal April 2018
Encodings and models for antimicrobial peptide classification for multi-resistant pathogens journal March 2019
Graph reconstruction using covariance-based methods journal November 2016
Cytokine Levels Correlate with Immune Cell Infiltration after Anti-VEGF Therapy in Preclinical Mouse Models of Breast Cancer journal November 2009
The Contributions of Interlocking Loops and Extensive Nonlinearity to the Properties of Circadian Clock Models journal November 2010
SECOM: A Novel Hash Seed and Community Detection Based-Approach for Genome-Scale Protein Domain Identification journal June 2012
Multi-Platform Metabolomic Analyses of Ergosterol-Induced Dynamic Changes in Nicotiana tabacum Cells journal January 2014
Topographic and Bioclimatic Determinants of the Occurrence of Forest and Grassland in Tropical Montane Forest-Grassland Mosaics of the Western Ghats, India journal June 2015
CATO: The Clone Alignment Tool journal July 2016
Rational design of DKK3 structure-based small peptides as antagonists of Wnt signaling pathway and in silico evaluation of their efficiency journal February 2017
An analytical approach to sparse telemetry data journal November 2017
Catechol-O-Methyltransferase moderates effect of stress mindset on affect and cognition journal April 2018
Skeletal development in the sea urchin relies upon protein families that contain intrinsic disorder, aggregation-prone, and conserved globular interactive domains journal October 2019
Design, synthesis, in silico toxicity prediction, molecular docking, and evaluation of novel pyrazole derivatives as potential antiproliferative agents
  • Ravula, Parameshwar; Vamaraju, Harinadha Babu; Paturi, Manichandrika
  • IfADo - Leibniz Research Centre for Working Environment and Human Factors, Dortmund https://doi.org/10.17179/excli2016-103
text January 2016
Data Integration in Genetics and Genomics: Methods and Challenges journal January 2009
Clustering Algorithms: Their Application to Gene Expression Data journal January 2016
Evolution of new regulatory functions on biophysically realistic fitness landscapes text January 2016
Analysis of predicted B and T-cell epitopes in Der p 23, allergen from Dermatophagoides pteronyssinus journal September 2017