Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Efficient Binary Static Code Data Flow Analysis Using Unsupervised Learning

Technical Report ·
DOI:https://doi.org/10.2172/1592974· OSTI ID:1592974
 [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
The ever increasing need to ensure that code is reliably, efficiently and safely constructed has fueled the evolution of popular static binary code analysis tools. In identifying potential coding flaws in binaries, tools such as IDA Pro are used to disassemble the binaries into an opcode/assembly language format in support of manual static code analysis. Because of the highly manual and resource intensive nature involved with analyzing large binaries, the probability of overlooking potential coding irregularities and inefficiencies is quite high. In this paper, a light-weight data, unsupervised data flow methodology is described which uses highly-correlated data flow graph (CDFGs) to identify coding irregularities such that analysis time and required computing resources are minimized. Such analysis accuracy and efficiency gains are achieved by using a combination of graph analysis and unsupervised machine learning techniques which allows an analyst to focus on the most statistically significant flow patterns while performing binary static code analysis.
Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC04-94AL85000; NA0003525
OSTI ID:
1592974
Report Number(s):
SAND--2019-14311R; 682701
Country of Publication:
United States
Language:
English

Similar Records

LSAFE: a Lightweight Static Analysis Framework for binary Executables
Conference · Mon Sep 09 00:00:00 EDT 2024 · OSTI ID:2446860

GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting
Conference · Tue Dec 31 23:00:00 EST 2019 · OSTI ID:1804071

Creating an Interprocedural Analyst-Oriented Data Flow Representation for Binary Analysts (CIAO)
Technical Report · Fri Nov 30 23:00:00 EST 2018 · OSTI ID:1529591