skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The maximum clique enumeration problem: algorithms, applications, and implementations

Journal Article · · BMC Bioinformatics
 [1];  [2];  [2];  [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Molecular Biophysics
  2. Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science

Background: The maximum clique enumeration (MCE) problem asks that we identify all maximum cliques in a finite, simple graph. MCE is closely related to two other well-known and widely-studied problems: the maximum clique optimization problem, which asks us to determine the size of a largest clique, and the maximal clique enumeration problem, which asks that we compile a listing of all maximal cliques. Naturally, these three problems are N P -hard, given that they subsume the classic version of the N P -complete clique decision problem. MCE can be solved in principle with standard enumeration methods due to Bron, Kerbosch, Kose and others. Unfortunately, these techniques are ill-suited to graphs encountered in our applications. We must solve MCE on instances deeply seeded in data mining and computational biology, where high-throughput data capture often creates graphs of extreme size and density. MCE can also be solved in principle using more modern algorithms based in part on vertex cover and the theory of fixed-parameter tractability (FPT). While FPT is an improvement, these algorithms too can fail to scale sufficiently well as the sizes and densities of our datasets grow. Results: An extensive testbed of benchmark graphs are created using publicly available transcriptomic datasets from the Gene Expression Omnibus (GEO). Empirical testing reveals crucial but latent features of such high-throughput biological data. In turn, it is shown that these features distinguish real data from random data intended to reproduce salient topological features. In particular, with real data there tends to be an unusually high degree of maximum clique overlap. Armed with this knowledge, novel decomposition strategies are tuned to the data and coupled with the best FPT MCE implementations. Conclusions: Several algorithmic improvements to MCE are made which progressively decrease the run time on graphs in the testbed. Frequently the final runtime improvement is several orders of magnitude. As a result, instances which were once prohibitively time-consuming to solve are brought into the domain of realistic feasibility.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
Journal Information:
BMC Bioinformatics, Vol. 13, Issue Suppl 10; ISSN 1471-2105
BioMed CentralCopyright Statement
Country of Publication:
United States

References (17)

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository journal January 2002
Emergence of Scaling in Random Networks journal October 1999
Visualizing plant metabolomic correlation networks using clique-metabolite matrices journal December 2001
Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function journal February 2005
Uncovering the overlapping community structure of complex networks in nature and society journal June 2005
A game–theoretic approach to partial clique enumeration journal June 2009
Algorithm 457: finding all cliques of an undirected graph journal September 1973
On cliques in graphs journal March 1965
The worst-case time complexity for generating all maximal cliques and computational experiments journal October 2006
Scalable Parallel Algorithms for FPT Problems journal April 2006
Parameterized Complexity book January 1999
An Efficient Branch-and-bound Algorithm for Finding a Maximum Clique with Computational Experiments journal July 2006
Maximal Consistent Subsets journal June 2007
On Parameterized Enumeration book January 2002
Synergism between IL7R and CXCR4 drives BCR-ABL induced transformation in Philadelphia chromosome-positive acute lymphoblastic leukemia journal June 2020
Computational, Integrative, and Comparative Methods for the Elucidation of Genetic Coexpression Networks journal January 2005
Threshold selection in gene co-expression networks using spectral graph theory techniques journal October 2009

Cited By (6)

A robustness metric for biological data clustering algorithms journal December 2019
ModuleDiscoverer: Identification of regulatory modules in protein-protein interaction networks journal March 2017
Fine-Grained Search Space Classification for Hard Enumeration Variants of Subset Problems journal July 2019
A multi-tissue genome-scale metabolic modeling framework for the analysis of whole plant systems journal January 2015
Recent Advances in Practical Data Reduction preprint January 2020
The PACE 2022 Parameterized Algorithms and Computational Experiments Challenge: Directed Feedback Vertex Set text January 2022