Characterization and Improvement of RNA-Seq Precision in Quantitative Transcript Expression Profiling
Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large scale RNA-Seq data sets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target coverage and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive target coverage of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, less than 30% of all transcripts could be quantified reliably with a relative error < 20%. Based on established tools, we then introduce a new approach for mapping and analyzing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1023181
- Report Number(s):
- PNNL-SA-77408; KP1704020
- Journal Information:
- Bioinformatics, 27(13):I383-I391, Journal Name: Bioinformatics, 27(13):I383-I391 Journal Issue: 13 Vol. 27
- Country of Publication:
- United States
- Language:
- English
Similar Records
Replicates, read numbers, and other important experimental design considerations for microbial RNA-seq identified using Bacillus thuringiensis datasets
Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing