skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Information Content of Discrete Functions and Their Application in Genetic Data Analysis

Abstract

The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. Here, we present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis—that of inference of genetic interactions. Genetic analysis provides a rich sourcemore » of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. Finally, we illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.« less

Authors:
 [1];  [1];  [1]
  1. Pacific Northwest Research Inst., Seattle, WA (United States)
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE; Bill and Melinda Gates Foundation; National Institutes of Health (NIH)
OSTI Identifier:
1422783
Grant/Contract Number:  
AC0576RL01830
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Computational Biology
Additional Journal Information:
Journal Volume: 24; Journal Issue: 12; Journal ID: ISSN 1557-8666
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES

Citation Formats

Sakhanenko, Nikita A., Kunert-Graf, James, and Galas, David J. The Information Content of Discrete Functions and Their Application in Genetic Data Analysis. United States: N. p., 2017. Web. doi:10.1089/CMB.2017.0143.
Sakhanenko, Nikita A., Kunert-Graf, James, & Galas, David J. The Information Content of Discrete Functions and Their Application in Genetic Data Analysis. United States. doi:10.1089/CMB.2017.0143.
Sakhanenko, Nikita A., Kunert-Graf, James, and Galas, David J. Fri . "The Information Content of Discrete Functions and Their Application in Genetic Data Analysis". United States. doi:10.1089/CMB.2017.0143. https://www.osti.gov/servlets/purl/1422783.
@article{osti_1422783,
title = {The Information Content of Discrete Functions and Their Application in Genetic Data Analysis},
author = {Sakhanenko, Nikita A. and Kunert-Graf, James and Galas, David J.},
abstractNote = {The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. Here, we present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis—that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. Finally, we illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.},
doi = {10.1089/CMB.2017.0143},
journal = {Journal of Computational Biology},
number = 12,
volume = 24,
place = {United States},
year = {Fri Oct 13 00:00:00 EDT 2017},
month = {Fri Oct 13 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: