skip to main content

DOE PAGESDOE PAGES

Title: Community detection in sequence similarity networks based on attribute clustering

Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs, for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detectionmore » method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Authors:
 [1] ; ORCiD logo [2] ; ORCiD logo [3]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Molecular Biophysics; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Joint Inst. for Biological Sciences and Biosciences Division; Univ. of Tennessee, Knoxville, TN (United States)
  2. Univ. of Tennessee, Knoxville, TN (United States). Dept. of of Microbiology, Dept. of Civil and Environmental Engineering; Univ. of Tennessee, Knoxville, TN (United States). Center for Environmental Biotechnology; Univ. of Tennessee, Knoxville, TN (United States). Dept. of Biochemistry and Cellular and Molecular Biology
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Molecular Biophysics; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Joint Inst. for Biological Sciences and Biosciences Division; Univ. of Tennessee, Knoxville, TN (United States). Dept. of Biochemistry and Cellular and Molecular Biology
Publication Date:
Grant/Contract Number:
AC05-00OR22725
Type:
Accepted Manuscript
Journal Name:
PLoS ONE
Additional Journal Information:
Journal Volume: 12; Journal Issue: 7; Journal ID: ISSN 1932-6203
Publisher:
Public Library of Science
Research Org:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES
OSTI Identifier:
1399395

Chowdhary, Janamejaya, Loeffler, Frank E., and Smith, Jeremy C.. Community detection in sequence similarity networks based on attribute clustering. United States: N. p., Web. doi:10.1371/journal.pone.0178650.
Chowdhary, Janamejaya, Loeffler, Frank E., & Smith, Jeremy C.. Community detection in sequence similarity networks based on attribute clustering. United States. doi:10.1371/journal.pone.0178650.
Chowdhary, Janamejaya, Loeffler, Frank E., and Smith, Jeremy C.. 2017. "Community detection in sequence similarity networks based on attribute clustering". United States. doi:10.1371/journal.pone.0178650. https://www.osti.gov/servlets/purl/1399395.
@article{osti_1399395,
title = {Community detection in sequence similarity networks based on attribute clustering},
author = {Chowdhary, Janamejaya and Loeffler, Frank E. and Smith, Jeremy C.},
abstractNote = {Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs, for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments},
doi = {10.1371/journal.pone.0178650},
journal = {PLoS ONE},
number = 7,
volume = 12,
place = {United States},
year = {2017},
month = {7}
}

Works referenced in this record:

Identification of common molecular subsequences
journal, March 1981

Basic local alignment search tool
journal, October 1990
  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

Improved tools for biological sequence comparison.
journal, April 1988
  • Pearson, W. R.; Lipman, D. J.
  • Proceedings of the National Academy of Sciences, Vol. 85, Issue 8, p. 2444-2448
  • DOI: 10.1073/pnas.85.8.2444