Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks
- Univ. of California, Berkeley, CA (United States). Dept. of Statistics; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Division of Environmental Genomics and Systems Biology
- Univ. of California, Berkeley, CA (United States). Dept. of Statistics; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Division of Environmental Genomics and Systems Biology; Walmart Labs, San Bruno, CA (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Division of Environmental Genomics and Systems Biology
- Univ. of California, Berkeley, CA (United States). Dept. of Statistics; Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences
Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set of Drosophila early embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation of Drosophila expression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. In conclusion, the performance of PP with the Drosophila data suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC); National Science Foundation (NSF); US Air Force Office of Scientific Research (AFOSR); National Institutes of Health (NIH)
- Grant/Contract Number:
- AC02-05CH11231; CCF-0939370; R01 GM076655; R01 GM097231; 1U01HG007031-01
- OSTI ID:
- 1379291
- Journal Information:
- Proceedings of the National Academy of Sciences of the United States of America, Vol. 113, Issue 16; ISSN 0027-8424
- Publisher:
- National Academy of Sciences, Washington, DC (United States)Copyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Similar Records
Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape
Global analysis of patterns of gene expression during Drosophilaembryogenesis