Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Binary Analysis with Architecture and Code Section Detection using Supervised Machine Learning

Conference ·
When presented with an unknown binary, which may or may not be complete, having the ability to determine information about it is critical to future reverse engineering, particularly in discovering the binary’s intended use and potentially malicious nature. This paper details techniques to both identify the machine architecture of the binary, as well as to locate the important code segments within the file. This identification of unknown binaries makes use of a technique called byte histogram in addition to various machine learning (ML) techniques, which we call “What is it Binary” or WiiBin. Benefits of byte histograms reflect the simplicity of calculation and do not rely on file headers or metadata, allowing for acceptable results when only a small portion of the original file is available. Utilizing WiiBin, we were able to accurately (>80%) determine the architecture of test binaries with as little as a 20% contagious portion of the file present. We were also able to determine the location of code sections within a binary by utilizing the WiiBin framework. Ultimately, the more information that can be gleaned from a binary file, the easier it is to successfully reverse engineer.
Research Organization:
Idaho National Laboratory (INL), Idaho Falls, ID (United States)
Sponsoring Organization:
USDOE Office of Nuclear Energy (NE)
DOE Contract Number:
AC07-05ID14517
OSTI ID:
1968804
Report Number(s):
INL/CON-21-64457-Rev000
Country of Publication:
United States
Language:
English

Similar Records

WiiBin
Software · Wed Sep 16 20:00:00 EDT 2020 · OSTI ID:code-45069

Detection of malicious computer executables
Patent · Tue Apr 14 00:00:00 EDT 2009 · OSTI ID:986572

Deep PDF parsing to extract features for detecting embedded malware.
Technical Report · Thu Sep 01 00:00:00 EDT 2011 · OSTI ID:1030303