APT malware static trace analysis through bigrams and graph edit distance
Abstract
Research and business organizations are vulnerable to attack by malware, particularly advanced persistent threat malware tailored for a specific target. Malware identification is made more difficult because samples can be subtly altered to avoid detection by methods that check for an identical match to known code. Different versions of an original piece of malware form a malware family. And when new malicious software is identified, reverse engineers seek to identify its origin and purpose. Knowing whether new malware is from a known family or a previously unobserved family aids the efficiency of reverse engineers. Furthermore, this article presents a three-stage method to classify new malware into a family by comparing its similarity to existing static traces, and assigning it to the most similar family. First, a fast filtering method creates a shortlist of samples with some similarity to the new malware, using a simple bigram comparison of the instructions. The second stage takes the call graph view of the shortlisted static traces and uses simulated annealing to estimate the graph edit distance, a measure of dissimilarity between graphs. Finally, a random forest classifier combines the previous two results to predict the family to which a new sample belongs. Our papermore »
- Authors:
-
- Imperial College, London (United Kingdom)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1364539
- Report Number(s):
- LA-UR-16-24029
Journal ID: ISSN 1932-1864
- Grant/Contract Number:
- AC52-06NA25396
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Statistical Analysis and Data Mining
- Additional Journal Information:
- Journal Volume: 10; Journal Issue: 3; Journal ID: ISSN 1932-1864
- Publisher:
- Wiley
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; call graph; family detection; malware detection; random forest; simulated annealing
Citation Formats
Bolton, Alexander D., and Anderson-Cook, Christine M. APT malware static trace analysis through bigrams and graph edit distance. United States: N. p., 2017.
Web. doi:10.1002/sam.11346.
Bolton, Alexander D., & Anderson-Cook, Christine M. APT malware static trace analysis through bigrams and graph edit distance. United States. https://doi.org/10.1002/sam.11346
Bolton, Alexander D., and Anderson-Cook, Christine M. Wed .
"APT malware static trace analysis through bigrams and graph edit distance". United States. https://doi.org/10.1002/sam.11346. https://www.osti.gov/servlets/purl/1364539.
@article{osti_1364539,
title = {APT malware static trace analysis through bigrams and graph edit distance},
author = {Bolton, Alexander D. and Anderson-Cook, Christine M.},
abstractNote = {Research and business organizations are vulnerable to attack by malware, particularly advanced persistent threat malware tailored for a specific target. Malware identification is made more difficult because samples can be subtly altered to avoid detection by methods that check for an identical match to known code. Different versions of an original piece of malware form a malware family. And when new malicious software is identified, reverse engineers seek to identify its origin and purpose. Knowing whether new malware is from a known family or a previously unobserved family aids the efficiency of reverse engineers. Furthermore, this article presents a three-stage method to classify new malware into a family by comparing its similarity to existing static traces, and assigning it to the most similar family. First, a fast filtering method creates a shortlist of samples with some similarity to the new malware, using a simple bigram comparison of the instructions. The second stage takes the call graph view of the shortlisted static traces and uses simulated annealing to estimate the graph edit distance, a measure of dissimilarity between graphs. Finally, a random forest classifier combines the previous two results to predict the family to which a new sample belongs. Our paper also considers how to detect when malware is from a new family.},
doi = {10.1002/sam.11346},
journal = {Statistical Analysis and Data Mining},
number = 3,
volume = 10,
place = {United States},
year = {Wed May 17 00:00:00 EDT 2017},
month = {Wed May 17 00:00:00 EDT 2017}
}
Works referenced in this record:
Improved call graph comparison using simulated annealing
conference, January 2011
- Kostakis, Orestis; Kinable, Joris; Mahmoudi, Hamed
- Proceedings of the 2011 ACM Symposium on Applied Computing - SAC '11
Using opcode sequences in single-class learning to detect unknown malware
journal, January 2011
- Santos, I.; Brezo, F.; Sanz, B.
- IET Information Security, Vol. 5, Issue 4
Stochastic identification of malware with dynamic traces
journal, March 2014
- Storlie, Curtis; Anderson, Blake; Vander Wiel, Scott
- The Annals of Applied Statistics, Vol. 8, Issue 1
Static Malware Analysis Using Machine Learning Methods
book, January 2014
- Nath, Hiran V.; Mehtre, Babu M.
- Recent Trends in Computer Networks and Distributed Systems Security
Comparing stars: on approximating graph edit distance
journal, August 2009
- Zeng, Zhiping; Tung, Anthony K. H.; Wang, Jianyong
- Proceedings of the VLDB Endowment, Vol. 2, Issue 1
Automated Classification and Analysis of Internet Malware
book, January 2007
- Bailey, Michael; Oberheide, Jon; Andersen, Jon
- Recent Advances in Intrusion Detection
Using Entropy Analysis to Find Encrypted and Packed Malware
journal, March 2007
- Lyda, Robert; Hamrock, James
- IEEE Security and Privacy Magazine, Vol. 5, Issue 2
A Biosequence-Based Approach to Software Characterization
conference, May 2016
- Oehmen, Christopher S.; Peterson, Elena S.; Phillips, Aaron R.
- 2016 IEEE Security and Privacy Workshops (SPW)
A Graph Matching Algorithm Using Data-Driven Markov Chain Monte Carlo Sampling
conference, August 2010
- Lee, Jungmin; Cho, Minsu; Lee, Kyoung Mu
- 2010 20th International Conference on Pattern Recognition
Data mining methods for detection of new malicious executables
conference, January 2001
- Schultz, M. G.; Eskin, E.; Zadok, F.
- Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on
Optimization by Simulated Annealing
journal, May 1983
- Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P.
- Science, Vol. 220, Issue 4598
Malware Target Recognition of Unknown Threats
journal, September 2013
- Dube, Thomas E.; Raines, Richard A.; Grimaila, Michael R.
- IEEE Systems Journal, Vol. 7, Issue 3
The Hungarian method for the assignment problem
journal, March 1955
- Kuhn, H. W.
- Naval Research Logistics Quarterly, Vol. 2, Issue 1-2, p. 83-97
Malware classification based on call graph clustering
journal, February 2011
- Kinable, Joris; Kostakis, Orestis
- Journal in Computer Virology, Vol. 7, Issue 4
Approximate graph edit distance computation by means of bipartite graph matching
journal, June 2009
- Riesen, Kaspar; Bunke, Horst
- Image and Vision Computing, Vol. 27, Issue 7
Improving malware classification: bridging the static/dynamic gap
conference, January 2012
- Anderson, Blake; Storlie, Curtis; Lane, Terran
- Proceedings of the 5th ACM workshop on Security and artificial intelligence - AISec '12