A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions using Sequential Quadratic Programming

Kim, Youngseok; Carbonetto, Peter; Stephens, Matthew; Anitescu, Mihai

doi:10.1080/10618600.2019.1689985

Title: A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions using Sequential Quadratic Programming

Abstract

Maximum likelihood estimation of mixture proportions has a long history, and continues to play an important role in modern statistics, including in development of nonparametric empirical Bayes methods. Maximum likelihood of mixture proportions has traditionally been solved using the expectation maximization (EM) algorithm, but recent work by Koenker and Mizera shows that modern convex optimization techniques-in particular, interior point methods-are substantially faster and more accurate than EM. Here, we develop a new solution based on sequential quadratic programming (SQP). It is substantially faster than the interior point method, and just as accurate. Our approach combines several ideas: first, it solves a reformulation of the original problem; second, it uses an SQP approach to make the best use of the expensive gradient and Hessian computations; third, the SQP iterations are implemented using an active set method to exploit the sparse nature of the quadratic subproblems; fourth, it uses accurate low-rank approximations for more efficient gradient and Hessian computations. We illustrate the benefits of the SQP approach in experiments on synthetic datasets and a large genetic association dataset. In large datasets (n approximate to 106observations,m approximate to 103mixture components), our implementation achieves at least 100-fold reduction in runtime compared with a state-of-the-artmore »« less

Authors:

Kim, Youngseok ^[1]; Carbonetto, Peter ^[1]; Stephens, Matthew ^[1]; Anitescu, Mihai ^[2]

Univ. of Chicago, IL (United States)
Univ. of Chicago, IL (United States); Argonne National Lab. (ANL), Lemont, IL (United States)

Publication Date:: Wed Jan 08 00:00:00 EST 2020

Research Org.:: Argonne National Laboratory (ANL), Argonne, IL (United States)

Sponsoring Org.:: National Science Foundation (NSF); National Institutes of Health (NIH); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

OSTI Identifier:: 1660711

Grant/Contract Number:: AC02-06CH11357

Resource Type:: Accepted Manuscript

Journal Name:: Journal of Computational and Graphical Statistics

Additional Journal Information:: Journal Volume: 29; Journal Issue: 2; Journal ID: ISSN 1061-8600

Publisher:: Taylor & Francis

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING

Citation Formats


                    Kim, Youngseok, Carbonetto, Peter, Stephens, Matthew, and Anitescu, Mihai. A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions using Sequential Quadratic Programming.  United States: N. p., 2020. 
Web.  doi:10.1080/10618600.2019.1689985.

Copy to clipboard


                    Kim, Youngseok, Carbonetto, Peter, Stephens, Matthew, & Anitescu, Mihai. A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions using Sequential Quadratic Programming.  United States.  https://doi.org/10.1080/10618600.2019.1689985

Copy to clipboard


                    Kim, Youngseok, Carbonetto, Peter, Stephens, Matthew, and Anitescu, Mihai. Wed .  
"A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions using Sequential Quadratic Programming".  United States.  https://doi.org/10.1080/10618600.2019.1689985.  https://www.osti.gov/servlets/purl/1660711.

Copy to clipboard


                    
@article{osti_1660711,

  title        = {A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions using Sequential Quadratic Programming},

  author       = {Kim, Youngseok and Carbonetto, Peter and Stephens, Matthew and Anitescu, Mihai},

  abstractNote = {Maximum likelihood estimation of mixture proportions has a long history, and continues to play an important role in modern statistics, including in development of nonparametric empirical Bayes methods. Maximum likelihood of mixture proportions has traditionally been solved using the expectation maximization (EM) algorithm, but recent work by Koenker and Mizera shows that modern convex optimization techniques-in particular, interior point methods-are substantially faster and more accurate than EM. Here, we develop a new solution based on sequential quadratic programming (SQP). It is substantially faster than the interior point method, and just as accurate. Our approach combines several ideas: first, it solves a reformulation of the original problem; second, it uses an SQP approach to make the best use of the expensive gradient and Hessian computations; third, the SQP iterations are implemented using an active set method to exploit the sparse nature of the quadratic subproblems; fourth, it uses accurate low-rank approximations for more efficient gradient and Hessian computations. We illustrate the benefits of the SQP approach in experiments on synthetic datasets and a large genetic association dataset. In large datasets (n approximate to 106observations,m approximate to 103mixture components), our implementation achieves at least 100-fold reduction in runtime compared with a state-of-the-art interior point solver. Our methods are implemented in Julia and in an R package available on CRAN (). Supplementary materials for this article are available online.},

  doi          = {10.1080/10618600.2019.1689985},

  journal      = {Journal of Computational and Graphical Statistics},

  number       = 2,

  volume       = 29,

  place        = {United States},

  year         = {Wed Jan 08 00:00:00 EST 2020},

  month        = {Wed Jan 08 00:00:00 EST 2020}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1080/10618600.2019.1689985

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 6 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Mixture Densities, Maximum Likelihood and the EM Algorithm
journal, April 1984

Redner, Richard A.; Walker, Homer F.
SIAM Review, Vol. 26, Issue 2
DOI: 10.1137/1026034

A Stochastic Quasi-Newton Method for Large-Scale Optimization
journal, January 2016

Byrd, R. H.; Hansen, S. L.; Nocedal, Jorge
SIAM Journal on Optimization, Vol. 26, Issue 2
DOI: 10.1137/140954362

Contributions to the Mathematical Theory of Evolution
journal, January 1894

Pearson, K.
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 185, Issue 0
DOI: 10.1098/rsta.1894.0003

Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules
journal, April 2014

Koenker, Roger; Mizera, Ivan
Journal of the American Statistical Association, Vol. 109, Issue 506
DOI: 10.1080/01621459.2013.869224

Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules
journal, April 2014

Koenker, Roger; Mizera, Ivan
Journal of the American Statistical Association, Vol. 109, Issue 506
DOI: 10.1080/01621459.2013.869224

Convex Optimization
book, January 2004

Boyd, Stephen; Vandenberghe, Lieven
DOI: 10.1017/CBO9780511804441

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
journal, January 2011

Halko, N.; Martinsson, P. G.; Tropp, J. A.
SIAM Review, Vol. 53, Issue 2
DOI: 10.1137/090771806

The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog)
journal, November 2016

MacArthur, Jacqueline; Bowler, Emily; Cerezo, Maria
Nucleic Acids Research, Vol. 45, Issue D1
DOI: 10.1093/nar/gkw1133

Deconvolution of a Distribution Function
journal, December 1997

Cordy, Clifford B.; Thomas, David R.
Journal of the American Statistical Association, Vol. 92, Issue 440
DOI: 10.1080/01621459.1997.10473667

REBayes : An R Package for Empirical Bayes Mixture Methods
journal, January 2017

Koenker, Roger; Gu, Jiaying
Journal of Statistical Software, Vol. 82, Issue 8
DOI: 10.18637/jss.v082.i08

Second-Order Stochastic Optimization for Machine Learning in Linear Time
text, January 2016

Agarwal, Naman; Bullins, Brian; Hazan, Elad
arXiv
DOI: 10.48550/arxiv.1602.03943

Interior-point methods
journal, December 2000

Potra, Florian A.; Wright, Stephen J.
Journal of Computational and Applied Mathematics, Vol. 124, Issue 1-2
DOI: 10.1016/S0377-0427(00)00433-7

Julia: A Fast Dynamic Language for Technical Computing
preprint, January 2012

Bezanson, Jeff; Karpinski, Stefan; Shah, Viral B.
arXiv
DOI: 10.48550/arxiv.1209.5145

Nonmonotone Spectral Projected Gradient Methods on Convex Sets
journal, January 2000

Birgin, Ernesto G.; Martínez, José Mario; Raydan, Marcos
SIAM Journal on Optimization, Vol. 10, Issue 4
DOI: 10.1137/S1052623497330963

Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters
journal, December 1956

Kiefer, J.; Wolfowitz, J.
The Annals of Mathematical Statistics, Vol. 27, Issue 4
DOI: 10.1214/aoms/1177728066

$rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
journal, November 2006

Aharon, M.; Elad, M.; Bruckstein, A.
IEEE Transactions on Signal Processing, Vol. 54, Issue 11
DOI: 10.1109/TSP.2006.881199

REBayes : An R Package for Empirical Bayes Mixture Methods
journal, January 2017

Koenker, Roger; Gu, Jiaying
Journal of Statistical Software, Vol. 82, Issue 8
DOI: 10.18637/jss.v082.i08

The performance of standard and hybrid em algorithms for ml estimates of the normal mixture model with censoring
journal, December 1992

Atkinson, Scott E.
Journal of Statistical Computation and Simulation, Vol. 44, Issue 1-2
DOI: 10.1080/00949659208811452

On Convergence Properties of the EM Algorithm for Gaussian Mixtures
journal, January 1996

Xu, Lei; Jordan, Michael I.
Neural Computation, Vol. 8, Issue 1
DOI: 10.1162/neco.1996.8.1.129

Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm
journal, June 2008

Varadhan, Ravi; Roland, Christophe
Scandinavian Journal of Statistics, Vol. 35, Issue 2
DOI: 10.1111/j.1467-9469.2007.00585.x

The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm
book, January 2000

Andersen, Erling D.; Andersen, Knud D.
Applied Optimization
DOI: 10.1007/978-1-4757-3216-0_8

Maximum Likelihood from Incomplete Data Via the EM Algorithm
journal, September 1977

Dempster, A. P.; Laird, N. M.; Rubin, D. B.
Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, Issue 1
DOI: 10.1111/j.2517-6161.1977.tb01600.x

False discovery rates: a new deal
journal, October 2016

Stephens, Matthew
Biostatistics
DOI: 10.1093/biostatistics/kxw041

Defining the role of common variation in the genomic and biological architecture of adult human height
journal, October 2014

Wood, Andrew R.; Esko, Tonu; Yang, Jian
Nature Genetics, Vol. 46, Issue 11
DOI: 10.1038/ng.3097

Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences
journal, August 2004

Johnstone, Iain M.; Silverman, Bernard W.
The Annals of Statistics, Vol. 32, Issue 4
DOI: 10.1214/009053604000000030

Nonparametric Maximum Likelihood Estimation of a Mixing Distribution
journal, December 1978

Laird, Nan
Journal of the American Statistical Association, Vol. 73, Issue 364
DOI: 10.1080/01621459.1978.10480103

Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach
journal, January 2010

Kim, Dongmin; Sra, Suvrit; Dhillon, Inderjit S.
SIAM Journal on Scientific Computing, Vol. 32, Issue 6
DOI: 10.1137/08073812X

JuMP: A Modeling Language for Mathematical Optimization
journal, January 2017

Dunning, Iain; Huchette, Joey; Lubin, Miles
SIAM Review, Vol. 59, Issue 2
DOI: 10.1137/15M1020575

Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters
journal, December 1956

Kiefer, J.; Wolfowitz, J.
The Annals of Mathematical Statistics, Vol. 27, Issue 4
DOI: 10.1214/aoms/1177728066

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions
text, January 2009

Halko, Nathan; Martinsson, Per-Gunnar; Tropp, Joel A.
arXiv
DOI: 10.48550/arxiv.0909.4061

Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means
journal, August 2009

Brown, Lawrence D.; Greenshtein, Eitan
The Annals of Statistics, Vol. 37, Issue 4
DOI: 10.1214/08-AOS630

General maximum likelihood empirical Bayes estimation of normal means
journal, August 2009

Jiang, Wenhua; Zhang, Cun-Hui
The Annals of Statistics, Vol. 37, Issue 4
DOI: 10.1214/08-aos638

Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences
text, January 2004

Johnstone, Iain M.; Silverman, Bernard W.
arXiv
DOI: 10.48550/arxiv.math/0410088

Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means
text, January 2009

Brown, Lawrence D.; Greenshtein, Eitan
arXiv
DOI: 10.48550/arxiv.0908.1712

A Stochastic Approximation Method
journal, September 1951

Robbins, Herbert; Monro, Sutton
The Annals of Mathematical Statistics, Vol. 22, Issue 3
DOI: 10.1214/aoms/1177729586

Efficient projections onto the l ₁ -ball for learning in high dimensions
conference, January 2008

Duchi, John; Shalev-Shwartz, Shai; Singer, Yoram
Proceedings of the 25th international conference on Machine learning - ICML '08
DOI: 10.1145/1390156.1390191

Works referencing / citing this record:

Solving the Empirical Bayes Normal Means Problem with Correlated Noise
preprint, January 2018

Sun, Lei; Stephens, Matthew
arXiv
DOI: 10.48550/arxiv.1812.07488

Similar Records in DOE PAGES and OSTI.GOV collections:

Large-scale sequential quadratic programming algorithms

Technical Report Eldersveld, S K

The problem addressed is the general nonlinear programming problem: finding a local minimizer for a nonlinear function subject to a mixture of nonlinear equality and inequality constraints. The methods studied are in the class of sequential quadratic programming (SQP) algorithms, which have previously proved successful for problems of moderate size. Our goal is to devise an SQP algorithm that is applicable to large-scale optimization problems, using sparse data structures and storing less curvature information but maintaining the property of superlinear convergence. The main features are: 1. The use of a quasi-Newton approximation to the reduced Hessian of the Lagrangian function.more »« less
https://doi.org/10.2172/10102731

Full Text Available
Large-scale sequential quadratic programming algorithms

Technical Report Eldersveld, S K

The problem addressed is the general nonlinear programming problem: finding a local minimizer for a nonlinear function subject to a mixture of nonlinear equality and inequality constraints. The methods studied are in the class of sequential quadratic programming (SQP) algorithms, which have previously proved successful for problems of moderate size. Our goal is to devise an SQP algorithm that is applicable to large-scale optimization problems, using sparse data structures and storing less curvature information but maintaining the property of superlinear convergence. The main features are: 1. The use of a quasi-Newton approximation to the reduced Hessian of the Lagrangian function.more »« less
https://doi.org/10.2172/6932047

Full Text Available
Sequential quadratic programming algorithms for optimization

Technical Report Prieto, F. J.

The problem considered is that of finding local minimizers for a function subject to general nonlinear inequality constraints, when first and perhaps second derivatives are available. The methods studied belong to the class of sequential quadratic programming (SQP) algorithms. In particular, the methods are based on the SQP algorithm embodied in the code NPSOL, which was developed at the Systems Optimization Laboratory, Stanford University. The goal of this paper is to develop SQP algorithms that allow some flexibility in their design. Specifically, we are interested in introducing modifications that enable the algorithms to solve large-scale problems efficiently. The following issuesmore »« less
https://doi.org/10.2172/5325989

Full Text Available
A reduced successive quadratic programming strategy for errors-in-variables estimation.

Journal Article Tjoa, I -B ; Biegler, L T ; Carnegie-Mellon Univ. - Comput. Chem. Eng.

Parameter estimation problems in process engineering represent a special class of nonlinear optimization problems, because the maximum likelihood structure of the objective function can be exploited. Within this class, the errors in variables method (EVM) is particularly interesting. Here we seek a weighted least-squares fit to the measurements with an underdetermined process model. Thus, both the number of variables and degrees of freedom available for optimization increase linearly with the number of data sets. Large optimization problems of this type can be particularly challenging and expensive to solve because, for general-purpose nonlinear programming (NLP) algorithms, the computational effort increases atmore »« less
https://doi.org/10.1016/0098-1354(92)80064-G
User's guide for SOL/NPSOL: a Fortran package for nonlinear programming

Technical Report Gill, P E ; Murray, W ; Saunders, M A ; ...

This report forms the user's guide for Version 1.1 of SOL/NPSOL, a set of Fortran subroutines designed to minimize an abritary smooth function subject to constraints, which may include simple bounds on the variables, linear constraints and smooth nonlinear constraints. (NPSOL may also be used for unconstrained, bound-constrained and linearly constrained optimization.) The user must provide subroutines that define the objective and constraint functions and their gradients. All matrices are treated as dense, and hence NPSOL is not intended for large sparse problems. NPSOL uses a sequential quadratic programming (SQP) algorithm, in which the search direction is the solution ofmore »« less

Other research related to this record: