The maximum clique enumeration problem: algorithms, applications, and implementations

Eblen, John D.; Phillips, Charles A.; Rogers, Gary L.; Langston, Michael A.

doi:10.1186/1471-2105-13-s10-s5

The maximum clique enumeration problem: algorithms, applications, and implementations

Journal Article · Mon Jun 25 00:00:00 EDT 2012 · BMC Bioinformatics

DOI:https://doi.org/10.1186/1471-2105-13-s10-s5· OSTI ID:1626291

Eblen, John D. ^[1]; Phillips, Charles A. ^[2]; Rogers, Gary L. ^[2]; Langston, Michael A. ^[2]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Molecular Biophysics; DOE/OSTI
Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science

Background: The maximum clique enumeration (MCE) problem asks that we identify all maximum cliques in a finite, simple graph. MCE is closely related to two other well-known and widely-studied problems: the maximum clique optimization problem, which asks us to determine the size of a largest clique, and the maximal clique enumeration problem, which asks that we compile a listing of all maximal cliques. Naturally, these three problems are N P -hard, given that they subsume the classic version of the N P -complete clique decision problem. MCE can be solved in principle with standard enumeration methods due to Bron, Kerbosch, Kose and others. Unfortunately, these techniques are ill-suited to graphs encountered in our applications. We must solve MCE on instances deeply seeded in data mining and computational biology, where high-throughput data capture often creates graphs of extreme size and density. MCE can also be solved in principle using more modern algorithms based in part on vertex cover and the theory of fixed-parameter tractability (FPT). While FPT is an improvement, these algorithms too can fail to scale sufficiently well as the sizes and densities of our datasets grow. Results: An extensive testbed of benchmark graphs are created using publicly available transcriptomic datasets from the Gene Expression Omnibus (GEO). Empirical testing reveals crucial but latent features of such high-throughput biological data. In turn, it is shown that these features distinguish real data from random data intended to reproduce salient topological features. In particular, with real data there tends to be an unusually high degree of maximum clique overlap. Armed with this knowledge, novel decomposition strategies are tuned to the data and coupled with the best FPT MCE implementations. Conclusions: Several algorithmic improvements to MCE are made which progressively decrease the run time on graphs in the testbed. Frequently the final runtime improvement is several orders of magnitude. As a result, instances which were once prohibitively time-consuming to solve are brought into the domain of realistic feasibility.

View Accepted Manuscript (DOE)

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division

Grant/Contract Number:: AC05-00OR22725

OSTI ID:: 1626291

Journal Information:: BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: Suppl 10 Vol. 13; ISSN 1471-2105

Publisher:: BioMed CentralCopyright Statement

Country of Publication:: United States

Language:: English

References (19)

On Parameterized Enumeration Fernau, Henning Lecture Notes in Computer Science https://doi.org/10.1007/3-540-45655-4_60	book	January 2002
Scalable Parallel Algorithms for FPT Problems Abu-Khzam, Faisal N.; Langston, Michael A.; Shanbhag, Pushkar Algorithmica, Vol. 45, Issue 3 https://doi.org/10.1007/s00453-006-1214-1	journal	April 2006
A game–theoretic approach to partial clique enumeration Rota Bulò, Samuel; Torsello, Andrea; Pelillo, Marcello Image and Vision Computing, Vol. 27, Issue 7 https://doi.org/10.1016/j.imavis.2008.10.003	journal	June 2009
The worst-case time complexity for generating all maximal cliques and computational experiments Tomita, Etsuji; Tanaka, Akira; Takahashi, Haruhisa Theoretical Computer Science, Vol. 363, Issue 1 https://doi.org/10.1016/j.tcs.2006.06.015	journal	October 2006
Uncovering the overlapping community structure of complex networks in nature and society Palla, Gergely; Derényi, Imre; Farkas, Illés Nature, Vol. 435, Issue 7043 https://doi.org/10.1038/nature03607	journal	June 2005
Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function Chesler, Elissa J.; Lu, Lu; Shou, Siming Nature Genetics, Vol. 37, Issue 3 https://doi.org/10.1038/ng1518	journal	February 2005
Synergism between IL7R and CXCR4 drives BCR-ABL induced transformation in Philadelphia chromosome-positive acute lymphoblastic leukemia Abdelrasoul, Hend; Vadakumchery, Anila; Werner, Markus Nature Communications, Vol. 11, Issue 1 https://doi.org/10.1038/s41467-020-16927-w	journal	June 2020
Emergence of Scaling in Random Networks Barabási, Albert-László; Albert, Réka Science, Vol. 286, Issue 5439 https://doi.org/10.1126/science.286.5439.509	journal	October 1999
Algorithm 457: finding all cliques of an undirected graph Bron, Coen; Kerbosch, Joep Communications of the ACM, Vol. 16, Issue 9 https://doi.org/10.1145/362342.362367	journal	September 1973
Computational, Integrative, and Comparative Methods for the Elucidation of Genetic Coexpression Networks Baldwin, Nicole E.; Chesler, Elissa J.; Kirov, Stefan Journal of Biomedicine and Biotechnology, Vol. 2005, Issue 2 https://doi.org/10.1155/jbb.2005.172	journal	January 2005
Threshold selection in gene co-expression networks using spectral graph theory techniques Perkins, Andy D.; Langston, Michael A. BMC Bioinformatics, Vol. 10, Issue S11 https://doi.org/10.1186/1471-2105-10-s11-s4	journal	October 2009
Parameterized Complexity Downey, R. G.; Fellows, M. R. Monographs in Computer Science https://doi.org/10.1007/978-1-4612-0515-9	book	January 1999
On cliques in graphs Moon, J. W.; Moser, L. Israel Journal of Mathematics, Vol. 3, Issue 1 https://doi.org/10.1007/BF02760024	journal	March 1965
An Efficient Branch-and-bound Algorithm for Finding a Maximum Clique with Computational Experiments Tomita, Etsuji; Kameda, Toshikatsu Journal of Global Optimization, Vol. 37, Issue 1 https://doi.org/10.1007/s10898-006-9039-7	journal	July 2006
Visualizing plant metabolomic correlation networks using clique-metabolite matrices Kose, F.; Weckwerth, W.; Linke, T. Bioinformatics, Vol. 17, Issue 12 https://doi.org/10.1093/bioinformatics/17.12.1198	journal	December 2001
Gene Expression Omnibus: NCBI gene expression and hybridization array data repository Edgar, R. Nucleic Acids Research, Vol. 30, Issue 1 https://doi.org/10.1093/nar/30.1.207	journal	January 2002
Computational, Integrative, and Comparative Methods for the Elucidation of Genetic Coexpression Networks Baldwin, Nicole E.; Chesler, Elissa J.; Kirov, Stefan Journal of Biomedicine and Biotechnology, Vol. 2005, Issue 2 https://doi.org/10.1155/JBB.2005.172	journal	January 2005
Maximal Consistent Subsets Malouf, Robert Computational Linguistics, Vol. 33, Issue 2 https://doi.org/10.1162/coli.2007.33.2.153	journal	June 2007
Threshold selection in gene co-expression networks using spectral graph theory techniques Perkins, Andy D.; Langston, Michael A. BMC Bioinformatics, Vol. 10, Issue S11 https://doi.org/10.1186/1471-2105-10-S11-S4	journal	October 2009

Cited By (6)

ModuleDiscoverer: Identification of regulatory modules in protein-protein interaction networks Vlaic, Sebastian; Tokarski-Schnelle, Christian; Gustafsson, Mika Scientific Reports https://doi.org/10.1101/119099	journal	March 2017
Fine-Grained Search Space Classification for Hard Enumeration Variants of Subset Problems Lauri, Juho; Dutta, Sourav Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, Issue 01 https://doi.org/10.1609/aaai.v33i01.33012314	journal	July 2019
A multi-tissue genome-scale metabolic modeling framework for the analysis of whole plant systems Gomes de Oliveira Dal'Molin, Cristiana; Quek, Lake-Ee; Saa, Pedro A. Frontiers in Plant Science, Vol. 6 https://doi.org/10.3389/fpls.2015.00004	journal	January 2015
Recent Advances in Practical Data Reduction Abu-Khzam, Faisal; Lamm, Sebastian; Mnich, Matthias arXiv https://doi.org/10.48550/arxiv.2012.12594	preprint	January 2020
The PACE 2022 Parameterized Algorithms and Computational Experiments Challenge: Directed Feedback Vertex Set Großmann, E.; Heuer, T.; Schulz, C. Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH (LZI) https://doi.org/10.5445/ir/1000154537	text	January 2022
A robustness metric for biological data clustering algorithms Lu, Yuping; Phillips, Charles A.; Langston, Michael A. BMC Bioinformatics, Vol. 20, Issue S15 https://doi.org/10.1186/s12859-019-3089-6	journal	December 2019

Similar Records

Maximal clique enumeration with data-parallel primitives

Conference · Sun Oct 01 00:00:00 EDT 2017 · OSTI ID:1440003

Finding Maximum Cliques on the D-Wave Quantum Annealer

Journal Article · Wed May 02 20:00:00 EDT 2018 · Journal of Signal Processing Systems · OSTI ID:1438358

A quadratic 0-1 optimization algorithm for the maximum clique and stable set problems

Conference · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:35856

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
Biochemistry & Molecular Biology
Biotechnology & Applied Microbiology
Mathematical & Computational Biology

The maximum clique enumeration problem: algorithms, applications, and implementations

Citation Formats

References (19)

Cited By (6)

Similar Records

Related Subjects