DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: APT malware static trace analysis through bigrams and graph edit distance

Abstract

Research and business organizations are vulnerable to attack by malware, particularly advanced persistent threat malware tailored for a specific target. Malware identification is made more difficult because samples can be subtly altered to avoid detection by methods that check for an identical match to known code. Different versions of an original piece of malware form a malware family. And when new malicious software is identified, reverse engineers seek to identify its origin and purpose. Knowing whether new malware is from a known family or a previously unobserved family aids the efficiency of reverse engineers. Furthermore, this article presents a three-stage method to classify new malware into a family by comparing its similarity to existing static traces, and assigning it to the most similar family. First, a fast filtering method creates a shortlist of samples with some similarity to the new malware, using a simple bigram comparison of the instructions. The second stage takes the call graph view of the shortlisted static traces and uses simulated annealing to estimate the graph edit distance, a measure of dissimilarity between graphs. Finally, a random forest classifier combines the previous two results to predict the family to which a new sample belongs. Our papermore » also considers how to detect when malware is from a new family.« less

Authors:
 [1];  [2]
  1. Imperial College, London (United Kingdom)
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1364539
Report Number(s):
LA-UR-16-24029
Journal ID: ISSN 1932-1864
Grant/Contract Number:  
AC52-06NA25396
Resource Type:
Accepted Manuscript
Journal Name:
Statistical Analysis and Data Mining
Additional Journal Information:
Journal Volume: 10; Journal Issue: 3; Journal ID: ISSN 1932-1864
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; call graph; family detection; malware detection; random forest; simulated annealing

Citation Formats

Bolton, Alexander D., and Anderson-Cook, Christine M. APT malware static trace analysis through bigrams and graph edit distance. United States: N. p., 2017. Web. doi:10.1002/sam.11346.
Bolton, Alexander D., & Anderson-Cook, Christine M. APT malware static trace analysis through bigrams and graph edit distance. United States. https://doi.org/10.1002/sam.11346
Bolton, Alexander D., and Anderson-Cook, Christine M. Wed . "APT malware static trace analysis through bigrams and graph edit distance". United States. https://doi.org/10.1002/sam.11346. https://www.osti.gov/servlets/purl/1364539.
@article{osti_1364539,
title = {APT malware static trace analysis through bigrams and graph edit distance},
author = {Bolton, Alexander D. and Anderson-Cook, Christine M.},
abstractNote = {Research and business organizations are vulnerable to attack by malware, particularly advanced persistent threat malware tailored for a specific target. Malware identification is made more difficult because samples can be subtly altered to avoid detection by methods that check for an identical match to known code. Different versions of an original piece of malware form a malware family. And when new malicious software is identified, reverse engineers seek to identify its origin and purpose. Knowing whether new malware is from a known family or a previously unobserved family aids the efficiency of reverse engineers. Furthermore, this article presents a three-stage method to classify new malware into a family by comparing its similarity to existing static traces, and assigning it to the most similar family. First, a fast filtering method creates a shortlist of samples with some similarity to the new malware, using a simple bigram comparison of the instructions. The second stage takes the call graph view of the shortlisted static traces and uses simulated annealing to estimate the graph edit distance, a measure of dissimilarity between graphs. Finally, a random forest classifier combines the previous two results to predict the family to which a new sample belongs. Our paper also considers how to detect when malware is from a new family.},
doi = {10.1002/sam.11346},
journal = {Statistical Analysis and Data Mining},
number = 3,
volume = 10,
place = {United States},
year = {Wed May 17 00:00:00 EDT 2017},
month = {Wed May 17 00:00:00 EDT 2017}
}

Works referenced in this record:

Improved call graph comparison using simulated annealing
conference, January 2011

  • Kostakis, Orestis; Kinable, Joris; Mahmoudi, Hamed
  • Proceedings of the 2011 ACM Symposium on Applied Computing - SAC '11
  • DOI: 10.1145/1982185.1982509

Using opcode sequences in single-class learning to detect unknown malware
journal, January 2011


Stochastic identification of malware with dynamic traces
journal, March 2014

  • Storlie, Curtis; Anderson, Blake; Vander Wiel, Scott
  • The Annals of Applied Statistics, Vol. 8, Issue 1
  • DOI: 10.1214/13-AOAS703

Static Malware Analysis Using Machine Learning Methods
book, January 2014


Comparing stars: on approximating graph edit distance
journal, August 2009

  • Zeng, Zhiping; Tung, Anthony K. H.; Wang, Jianyong
  • Proceedings of the VLDB Endowment, Vol. 2, Issue 1
  • DOI: 10.14778/1687627.1687631

Automated Classification and Analysis of Internet Malware
book, January 2007


Using Entropy Analysis to Find Encrypted and Packed Malware
journal, March 2007

  • Lyda, Robert; Hamrock, James
  • IEEE Security and Privacy Magazine, Vol. 5, Issue 2
  • DOI: 10.1109/MSP.2007.48

A Biosequence-Based Approach to Software Characterization
conference, May 2016

  • Oehmen, Christopher S.; Peterson, Elena S.; Phillips, Aaron R.
  • 2016 IEEE Security and Privacy Workshops (SPW)
  • DOI: 10.1109/SPW.2016.43

A Graph Matching Algorithm Using Data-Driven Markov Chain Monte Carlo Sampling
conference, August 2010

  • Lee, Jungmin; Cho, Minsu; Lee, Kyoung Mu
  • 2010 20th International Conference on Pattern Recognition
  • DOI: 10.1109/ICPR.2010.690

Data mining methods for detection of new malicious executables
conference, January 2001

  • Schultz, M. G.; Eskin, E.; Zadok, F.
  • Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on
  • DOI: 10.1109/SECPRI.2001.924286

Optimization by Simulated Annealing
journal, May 1983


Malware Target Recognition of Unknown Threats
journal, September 2013


The Hungarian method for the assignment problem
journal, March 1955


Malware classification based on call graph clustering
journal, February 2011


Approximate graph edit distance computation by means of bipartite graph matching
journal, June 2009


Random Forests
journal, January 2001


Improving malware classification: bridging the static/dynamic gap
conference, January 2012

  • Anderson, Blake; Storlie, Curtis; Lane, Terran
  • Proceedings of the 5th ACM workshop on Security and artificial intelligence - AISec '12
  • DOI: 10.1145/2381896.2381900