DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Zero-truncated Poisson regression for sparse multiway count data corrupted by false zeros

Journal Article · · Information and Inference (Online)

Abstract We propose a novel statistical inference methodology for multiway count data that is corrupted by false zeros that are indistinguishable from true zero counts. Our approach consists of zero-truncating the Poisson distribution to neglect all zero values. This simple truncated approach dispenses with the need to distinguish between true and false zero counts and reduces the amount of data to be processed. Inference is accomplished via tensor completion that imposes low-rank tensor structure on the Poisson parameter space. Our main result shows that an $$N$$-way rank-$$R$$ parametric tensor $$\boldsymbol{\mathscr{M}}\in (0,\infty )^{I\times \cdots \times I}$$ generating Poisson observations can be accurately estimated by zero-truncated Poisson regression from approximately $$IR^2\log _2^2(I)$$ non-zero counts under the nonnegative canonical polyadic decomposition. Our result also quantifies the error made by zero-truncating the Poisson distribution when the parameter is uniformly bounded from below. Therefore, under a low-rank multiparameter model, we propose an implementable approach guaranteed to achieve accurate regression in under-determined scenarios with substantial corruption by false zeros. Several numerical experiments are presented to explore the theoretical results.

Sponsoring Organization:
USDOE
OSTI ID:
1973298
Alternate ID(s):
OSTI ID: 2311572
Journal Information:
Information and Inference (Online), Journal Name: Information and Inference (Online) Journal Issue: 3 Vol. 12; ISSN 2049-8772
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (31)

Statistical detection of systematic election irregularities journal September 2012
Matrix Completion From a Few Entries journal June 2010
Unsupervised Multiway Data Analysis: A Literature Survey journal January 2009
Tutorial on maximum likelihood estimation journal February 2003
A Survey of Text Clustering Algorithms book January 2012
Tensor Decompositions and Applications journal August 2009
The Analysis of Count Data: Overdispersion and Autocorrelation journal January 1992
Applicability of machine learning in spam and phishing email filtering: review and approaches journal February 2020
Modelling count data with excessive zeros: The need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data journal November 2009
Statistical Models for Count Data journal January 2016
The Power of Convex Relaxation: Near-Optimal Matrix Completion journal May 2010
Stochastic Gradients for Large-Scale Tensor Decomposition journal January 2020
Real-time 3D reconstruction from single-photon lidar data using plug-and-play point cloud denoisers journal November 2019
Computing non-negative tensor factorizations journal August 2008
The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives journal February 2009
Spectral Algorithms for Tensor Completion journal March 2018
Poisson Matrix Recovery and Completion journal March 2016
A Limited Memory Algorithm for Bound Constrained Optimization journal September 1995
Learning Tensors From Partial Binary Measurements journal January 2019
On Tensor Completion via Nuclear Norm Minimization journal June 2015
Near-optimal sample complexity for convex tensor completion journal November 2018
Tensor completion and low-n-rank tensor recovery via convex optimization journal January 2011
Nonnegative approximations of nonnegative tensors journal July 2009
On Tensors, Sparsity, and Nonnegative Factorizations journal January 2012
Generalized Canonical Polyadic Tensor Decomposition journal January 2020
Probabilistic latent semantic indexing
  • Hofmann, Thomas
  • Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval , p. 50-57 https://doi.org/10.1145/312624.312649
conference January 1999
The Convex Geometry of Linear Inverse Problems journal October 2012
Incoherent Tensor Norms and Their Applications in Higher Order Tensor Completion journal October 2017
Text Classification Algorithms: A Survey journal April 2019
Modelling Excess Zeros in Count Data with Application to Antenatal Care Utilisation journal April 2018
Network analysis for count data with excess zeros journal November 2017