# Simple, direct and efficient multi-way spectral clustering

## Abstract

Abstract We present a new algorithm for spectral clustering based on a column-pivoted QR factorization that may be directly used for cluster assignment or to provide an initial guess for k-means. Our algorithm is simple to implement, direct and requires no initial guess. Furthermore, it scales linearly in the number of nodes of the graph and a randomized variant provides significant computational gains. Provided the subspace spanned by the eigenvectors used for clustering contains a basis that resembles the set of indicator vectors on the clusters, we prove that both our deterministic and randomized algorithms recover a basis close to the indicators in Frobenius norm. We also experimentally demonstrate that the performance of our algorithm tracks recent information theoretic bounds for exact recovery in the stochastic block model. Finally, we explore the performance of our algorithm when applied to a real-world graph.

- Authors:

- Department of Computer Science, Cornell University, Gates Hall, Ithaca, NY
- Center for Computational Biology, Flatiron Institute, Fifth Avenue, New York, NY
- Department of Mathematics and Institute for Computational & Mathematical Engineering, Stanford University, Serra Mall, Bldg, Stanford, CA

- Publication Date:

- Sponsoring Org.:
- USDOE

- OSTI Identifier:
- 1457488

- Grant/Contract Number:
- [FG02-97ER25308; FC02-13ER26134; SC0009409]

- Resource Type:
- Published Article

- Journal Name:
- Information and Inference: A Journal of the IMA

- Additional Journal Information:
- [Journal Name: Information and Inference: A Journal of the IMA Journal Volume: 8 Journal Issue: 1]; Journal ID: ISSN 2049-8772

- Publisher:
- Oxford University Press

- Country of Publication:
- Country unknown/Code not available

- Language:
- English

### Citation Formats

```
Damle, Anil, Minden, Victor, and Ying, Lexing. Simple, direct and efficient multi-way spectral clustering. Country unknown/Code not available: N. p., 2018.
Web. doi:10.1093/imaiai/iay008.
```

```
Damle, Anil, Minden, Victor, & Ying, Lexing. Simple, direct and efficient multi-way spectral clustering. Country unknown/Code not available. doi:10.1093/imaiai/iay008.
```

```
Damle, Anil, Minden, Victor, and Ying, Lexing. Wed .
"Simple, direct and efficient multi-way spectral clustering". Country unknown/Code not available. doi:10.1093/imaiai/iay008.
```

```
@article{osti_1457488,
```

title = {Simple, direct and efficient multi-way spectral clustering},

author = {Damle, Anil and Minden, Victor and Ying, Lexing},

abstractNote = {Abstract We present a new algorithm for spectral clustering based on a column-pivoted QR factorization that may be directly used for cluster assignment or to provide an initial guess for k-means. Our algorithm is simple to implement, direct and requires no initial guess. Furthermore, it scales linearly in the number of nodes of the graph and a randomized variant provides significant computational gains. Provided the subspace spanned by the eigenvectors used for clustering contains a basis that resembles the set of indicator vectors on the clusters, we prove that both our deterministic and randomized algorithms recover a basis close to the indicators in Frobenius norm. We also experimentally demonstrate that the performance of our algorithm tracks recent information theoretic bounds for exact recovery in the stochastic block model. Finally, we explore the performance of our algorithm when applied to a real-world graph.},

doi = {10.1093/imaiai/iay008},

journal = {Information and Inference: A Journal of the IMA},

number = [1],

volume = [8],

place = {Country unknown/Code not available},

year = {2018},

month = {6}

}

DOI: 10.1093/imaiai/iay008

Works referenced in this record:

##
The geometry of kernelized spectral clustering

journal, April 2015

- Schiebinger, Geoffrey; Wainwright, Martin J.; Yu, Bin
- The Annals of Statistics, Vol. 43, Issue 2

##
Least squares quantization in PCM

journal, March 1982

- Lloyd, S.
- IEEE Transactions on Information Theory, Vol. 28, Issue 2

##
Decay Properties of Spectral Projectors with Applications to Electronic Structure

journal, January 2013

- Benzi, Michele; Boito, Paola; Razouk, Nader
- SIAM Review, Vol. 55, Issue 1

##
Exact Recovery in the Stochastic Block Model

journal, January 2016

- Abbe, Emmanuel; Bandeira, Afonso S.; Hall, Georgina
- IEEE Transactions on Information Theory, Vol. 62, Issue 1

##
Spectral clustering and the high-dimensional stochastic blockmodel

journal, August 2011

- Rohe, Karl; Chatterjee, Sourav; Yu, Bin
- The Annals of Statistics, Vol. 39, Issue 4

##
CUR matrix decompositions for improved data analysis

journal, January 2009

- Mahoney, Michael W.; Drineas, Petros
- Proceedings of the National Academy of Sciences, Vol. 106, Issue 3

##
Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization

journal, July 1996

- Gu, Ming; Eisenstat, Stanley C.
- SIAM Journal on Scientific Computing, Vol. 17, Issue 4

##
Semidefinite programs on sparse random graphs and their application to community detection

conference, January 2016

- Montanari, Andrea; Sen, Subhabrata
- Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing - STOC 2016

##
Compressed Representation of Kohn–Sham Orbitals via Selected Columns of the Density Matrix

journal, March 2015

- Damle, Anil; Lin, Lin; Ying, Lexing
- Journal of Chemical Theory and Computation, Vol. 11, Issue 4

##
A comparative study of efficient initialization methods for the k-means clustering algorithm

journal, January 2013

- Celebi, M. Emre; Kingravi, Hassan A.; Vela, Patricio A.
- Expert Systems with Applications, Vol. 40, Issue 1

##
An Introduction to Matrix Concentration Inequalities

journal, January 2015

- Tropp, Joel A.
- Foundations and Trends® in Machine Learning, Vol. 8, Issue 1-2

##
Lower Bounds for the Partitioning of Graphs

journal, September 1973

- Donath, W. E.; Hoffman, A. J.
- IBM Journal of Research and Development, Vol. 17, Issue 5

##
Spectral redemption in clustering sparse networks

journal, November 2013

- Krzakala, F.; Moore, C.; Mossel, E.
- Proceedings of the National Academy of Sciences, Vol. 110, Issue 52

##
Computing Localized Representations of the Kohn--Sham Subspace Via Randomization and Refinement

journal, January 2017

- Damle, Anil; Lin, Lin; Ying, Lexing
- SIAM Journal on Scientific Computing, Vol. 39, Issue 6

##
Some metric inequalities in the space of matrices

journal, January 1955

- Fan, Ky; Hoffman, A. J.
- Proceedings of the American Mathematical Society, Vol. 6, Issue 1

##
Partitioning into Expanders

conference, January 2014

- Gharan, Shayan Oveis; Trevisan, Luca
- Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms

##
Computing the Polar Decomposition—with Applications

journal, October 1986

- Higham, Nicholas J.
- SIAM Journal on Scientific and Statistical Computing, Vol. 7, Issue 4

##
Stochastic blockmodels: First steps

journal, June 1983

- Holland, Paul W.; Laskey, Kathryn Blackmond; Leinhardt, Samuel
- Social Networks, Vol. 5, Issue 2

##
Sharp nonasymptotic bounds on the norm of random matrices with independent entries

journal, July 2016

- Bandeira, Afonso S.; van Handel, Ramon
- The Annals of Probability, Vol. 44, Issue 4

##
On Rank-Revealing Factorisations

journal, April 1994

- Chandrasekaran, Shivkumar; Ipsen, Ilse C. F.
- SIAM Journal on Matrix Analysis and Applications, Vol. 15, Issue 2

##
A BLAS-3 Version of the QR Factorization with Column Pivoting

journal, September 1998

- Quintana-Ortí, Gregorio; Sun, Xiaobai; Bischof, Christian H.
- SIAM Journal on Scientific Computing, Vol. 19, Issue 5

##
The Rotation of Eigenvectors by a Perturbation. III

journal, March 1970

- Davis, Chandler; Kahan, W. M.
- SIAM Journal on Numerical Analysis, Vol. 7, Issue 1

##
New Perturbation Bounds for the Unitary Polar Factor

journal, January 1995

- Li, Ren-Cang
- SIAM Journal on Matrix Analysis and Applications, Vol. 16, Issue 1

##
A tutorial on spectral clustering

journal, August 2007

- von Luxburg, Ulrike
- Statistics and Computing, Vol. 17, Issue 4

##
Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery

conference, October 2015

- Abbe, Emmanuel; Sandon, Colin
- 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS)

##
Achieving Exact Cluster Recovery Threshold via Semidefinite Programming

journal, May 2016

- Hajek, Bruce; Wu, Yihong; Xu, Jiaming
- IEEE Transactions on Information Theory, Vol. 62, Issue 5

##
Fast monte-carlo algorithms for finding low-rank approximations

journal, November 2004

- Frieze, Alan; Kannan, Ravi; Vempala, Santosh
- Journal of the ACM, Vol. 51, Issue 6

##
Linear least squares solutions by householder transformations

journal, June 1965

- Businger, Peter; Golub, Gene H.
- Numerische Mathematik, Vol. 7, Issue 3

##
Achieving exact cluster recovery threshold via semidefinite programming

conference, June 2015

- Hajek, Bruce; Wu, Yihong; Xu, Jiaming
- 2015 IEEE International Symposium on Information Theory (ISIT)