skip to main content


This content will become publicly available on September 15, 2018

Title: Order priors for Bayesian network discovery with an application to malware phylogeny

Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges) in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared tomore » existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less
ORCiD logo [1] ;  [2] ; ORCiD logo [1] ; ORCiD logo [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. Cisco Systems Inc., Durham, NC (United States)
Publication Date:
Report Number(s):
Journal ID: ISSN 1932-1864
Grant/Contract Number:
Accepted Manuscript
Journal Name:
Statistical Analysis and Data Mining
Additional Journal Information:
Journal Volume: 10; Journal Issue: 5; Journal ID: ISSN 1932-1864
Research Org:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org:
USDOE Laboratory Directed Research and Development (LDRD) Program
Country of Publication:
United States
97 MATHEMATICS AND COMPUTING; Bayesian networks; cyber security; malware; probabilistic graphical models
OSTI Identifier: