skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Labeling for Entity Extraction in Cyber Security

Abstract

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus (~750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.

Authors:
 [1];  [1];  [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
Work for Others (WFO)
OSTI Identifier:
1143555
DOE Contract Number:  
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2014 ASE International Conference on Cyber Security, Stanford, CA, USA, 20140527, 20140331
Country of Publication:
United States
Language:
English
Subject:
Entity Extraction; Automatic Labeling; Maximum Entropy Model; Averaged Perceptron; Cyber Security

Citation Formats

Bridges, Robert A, Jones, Corinne L, Iannacone, Michael D, Testa, Kelly M, and Goodall, John R. Automatic Labeling for Entity Extraction in Cyber Security. United States: N. p., 2014. Web.
Bridges, Robert A, Jones, Corinne L, Iannacone, Michael D, Testa, Kelly M, & Goodall, John R. Automatic Labeling for Entity Extraction in Cyber Security. United States.
Bridges, Robert A, Jones, Corinne L, Iannacone, Michael D, Testa, Kelly M, and Goodall, John R. 2014. "Automatic Labeling for Entity Extraction in Cyber Security". United States.
@article{osti_1143555,
title = {Automatic Labeling for Entity Extraction in Cyber Security},
author = {Bridges, Robert A and Jones, Corinne L and Iannacone, Michael D and Testa, Kelly M and Goodall, John R},
abstractNote = {Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus (~750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.},
doi = {},
url = {https://www.osti.gov/biblio/1143555}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jan 01 00:00:00 EST 2014},
month = {Wed Jan 01 00:00:00 EST 2014}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: