skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gkx1313· OSTI ID:1439241
 [1]; ORCiD logo [2];  [3];  [2];  [4]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  3. Centre for Research & Technology Hellas, Thessalonica (Greece). Biological Computation & Process Lab. Chemical Process & Energy Resources Inst.
  4. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1439241
Journal Information:
Nucleic Acids Research, Vol. 46, Issue 6; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 62 works
Citation information provided by
Web of Science

References (52)

A survey of visualization tools for biological network analysis journal November 2008
Uncovering the overlapping community structure of complex networks in nature and society journal June 2005
SNAP: A General-Purpose Network Analysis and Graph-Mining Library journal July 2016
Fast unfolding of communities in large networks journal October 2008
Time bounds for selection journal August 1973
Clustering by Passing Messages Between Data Points journal February 2007
Detection of functional modules from protein interaction networks journal September 2003
SUMMA: scalable universal matrix multiplication algorithm journal April 1997
Domain enhanced lookup time accelerated BLAST journal January 2012
SWORD—a highly efficient protein database search journal September 2016
CoGenT++: an extensive and extensible data environment for computational genomics journal October 2005
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments journal January 2012
Graph Clustering Via a Discrete Uncoupling Process journal January 2008
Markov clustering versus affinity propagation for the partitioning of protein interaction graphs journal March 2009
jClust: a clustering and visualization toolbox journal May 2009
Microbiome Data Science: Understanding Our Microbial Planet journal June 2016
BSW: FPGA-accelerated BLAST-Wrapped Smith-Waterman aligner conference December 2013
Adaptive seeds tame genomic sequence comparison journal January 2011
Parallel Reproducible Summation journal July 2015
Classification schemes for protein structure and function journal July 2003
Comparing the performance of biomedical clustering methods journal September 2015
Empirical Comparison of Visualization Tools for Larger-Scale Network Analysis journal January 2017
SPICi: a fast clustering algorithm for large biological networks journal February 2010
Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future journal August 2015
Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks journal November 2003
clusterMaker: a multi-algorithm clustering plugin for Cytoscape journal November 2011
Network biology: understanding the cell's functional organization journal February 2004
Evaluation of clustering algorithms for protein-protein interaction networks journal November 2006
NAP: The Network Analysis Profiler, a web tool for easier topological analysis and comparison of medium-scale biological networks journal July 2017
Using graph theory to analyze biological networks journal April 2011
Superparamagnetic Clustering of Data journal April 1996
An efficient algorithm for large-scale detection of protein families journal April 2002
Medusa: A tool for exploring and clustering biological networks journal October 2011
A Genomic Perspective on Protein Families journal October 1997
Protein complex prediction via cost-based clustering journal June 2004
IMG/M: integrated genome and metagenome comparative data analysis system journal October 2016
CALU: A Communication Optimal LU Factorization Algorithm journal October 2011
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space journal June 2008
The Combinatorial BLAS: design, implementation, and applications journal May 2011
Construction, Visualisation, and Clustering of Transcription Networks from Microarray Expression Data journal January 2007
Which clustering algorithm is better for predicting protein complexes? journal December 2011
Dynamics of collective action to conserve a large common-pool resource journal April 2021
Superparamagnetic clustering of data journal April 1998
Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. text January 2015
Construction, visualisation, and clustering of transcription networks from microarray expression data. text January 2007
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments text January 2011
SNAP: A General Purpose Network Analysis and Graph Mining Library preprint January 2016
Basic local alignment search tool journal October 1990
H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs journal January 2017
New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM journal October 1987
Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format journal May 2012
Survey of Clustering Algorithms journal May 2005

Cited By (11)

Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication journal January 2019
A novel parallel Markov clustering method in biological interaction network analysis under multi-GPU computing environment journal February 2020
The parallelism motifs of genomic data analysis
  • Yelick, Katherine; Buluç, Aydın; Awan, Muaaz
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0394
journal January 2020
RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets journal July 2019
Self-analysis of repeat proteins reveals evolutionarily conserved patterns journal May 2020
JGI QC impact on assembly, binning, phylogenomics, and functional analysis dataset January 2021
Impact of BBDuk metagenomic read trimming and decontamination dataset January 2021
Developing computational biology at meridian 23° E, and a little eastwards journal November 2018
Systematical Identification of Breast Cancer-Related Circular RNA Modules for Deciphering circRNA Functions Based on the Non-Negative Matrix Factorization Algorithm journal February 2019
Optimizing High Performance Markov Clustering for Pre-Exascale Architectures text January 2020
Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale preprint January 2020

Figures / Tables (7)


Similar Records

Optimizing High Performance Markov Clustering for Pre-Exascale Architectures
Journal Article · Fri May 01 00:00:00 EDT 2020 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS) · OSTI ID:1439241

Multi-node and Multi-core Performance Studies of a Monte Carlo code RMC
Journal Article · Wed Jun 15 00:00:00 EDT 2016 · Transactions of the American Nuclear Society · OSTI ID:1439241

Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor
Conference · Tue May 23 00:00:00 EDT 2017 · OSTI ID:1439241