DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Large scale maximum average power multiple inference on time‐course count data with application to RNA‐seq analysis

Journal Article · · Biometrics
DOI: https://doi.org/10.1111/biom.13144 · OSTI ID:1604120
 [1]; ORCiD logo [1];  [1];  [2]
  1. Department of Statistics Colorado State University Fort Collins Colorado
  2. Department of Biology Colorado State University Fort Collins Colorado

Abstract Experiments that longitudinally collect RNA sequencing (RNA‐seq) data can provide transformative insights in biology research by revealing the dynamic patterns of genes. Such experiments create a great demand for new analytic approaches to identify differentially expressed (DE) genes based on large‐scale time‐course count data. Existing methods, however, are suboptimal with respect to power and may lack theoretical justification. Furthermore, most existing tests are designed to distinguish among conditions based on overall differential patterns across time, though in practice, a variety of composite hypotheses are of more scientific interest. Finally, some current methods may fail to control the false discovery rate. In this paper, we propose a new model and testing procedure to address the above issues simultaneously. Specifically, conditional on a latent Gaussian mixture with evolving means, we model the data by negative binomial distributions. Motivated by Storey (2007) and Hwang and Liu (2010), we introduce a general testing framework based on the proposed model and show that the proposed test enjoys the optimality property of maximum average power. The test allows not only identification of traditional DE genes but also testing of a variety of composite hypotheses of biological interest. We establish the identifiability of the proposed model, implement the proposed method via efficient algorithms, and demonstrate its good performance via simulation studies. The procedure reveals interesting biological insights, when applied to data from an experiment that examines the effect of varying light environments on the fundamental physiology of the marine diatom Phaeodactylum tricornutum .

Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0018344
OSTI ID:
1604120
Journal Information:
Biometrics, Journal Name: Biometrics Journal Issue: 1 Vol. 76; ISSN 0006-341X
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (33)

The Phaeodactylum genome reveals the evolutionary history of diatom genomes journal October 2008
Comparative analysis of differential gene expression tools for RNA sequencing time course data journal October 2017
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data journal November 2009
Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation journal June 2014
Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series journal June 2014
Detecting time periods of differential gene expression using Gaussian processes: an application to endothelial cells exposed to radiotherapy dose fraction journal October 2014
EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments journal April 2015
An informative approach on differential abundance analysis for time-course metagenomic sequencing data journal January 2017
Tuning parameter selectors for the smoothly clipped absolute deviation method journal August 2007
A negative binomial model for time series of counts journal July 2009
Improved statistical tests for differential gene expression by shrinking variance components estimates journal December 2004
Small-sample estimation of negative binomial dispersion, with applications to SAGE data journal July 2007
Variance component score test for time-course gene set analysis of longitudinal RNA-seq data journal March 2017
An Optimal Test with Maximum Average Power While Controlling FDR with Application to RNA-Seq Data: AMAP Test for RNA-Seq Data journal July 2013
The optimal discovery procedure: a new approach to simultaneous significance testing journal June 2007
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing journal January 1995
Time Series Expression Analyses Using RNA-seq: A Statistical Approach journal January 2013
Differential expression analysis for sequence count data journal October 2010
voom: precision weights unlock linear model analysis tools for RNA-seq read counts journal January 2014
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model journal August 2016
GPrank: an R package for detecting dynamic elements from genome-wide time series journal October 2018
deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies journal June 2015
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 journal December 2014
stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage journal August 2017
Variable Selection in Finite Mixture of Regression Models journal September 2007
Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression journal June 2016
On the Identifiability of Finite Mixtures journal February 1968
Identifiability of Finite Mixtures journal December 1963
Optimal Rate of Convergence for Finite Mixture Models journal February 1995
Natural Cubic Spline Regression Modeling Followed by Dynamic Network Reconstruction for the Identification of Radiation-Sensitivity Gene Association Networks from Time-Course Transcriptome Data journal August 2016
Optimal Tests Shrinking Both Means and Variances Applicable to Microarray Data Analysis journal January 2010
Model diagnostics for smoothing spline ANOVA models journal December 2004
A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data journal February 2012