Automatic Labeling for Entity Extraction in Cyber Security
Conference
·
OSTI ID:1143555
- ORNL
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus (~750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.
- Research Organization:
- Oak Ridge National Laboratory (ORNL)
- Sponsoring Organization:
- ORNL work for others
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1143555
- Country of Publication:
- United States
- Language:
- English
Similar Records
Cybersecurity Automated Information Extraction Techniques: Drawbacks of Current Methods, and Enhanced Extractors
Creating Training Data for Scientific Named Entity Recognition with Minimal Human Effort
Towards a Relation Extraction Framework for Cyber-Security Concepts
Conference
·
Sun Dec 31 23:00:00 EST 2017
·
OSTI ID:1424492
Creating Training Data for Scientific Named Entity Recognition with Minimal Human Effort
Conference
·
Mon Dec 31 23:00:00 EST 2018
·
OSTI ID:1558659
Towards a Relation Extraction Framework for Cyber-Security Concepts
Conference
·
Wed Dec 31 23:00:00 EST 2014
·
OSTI ID:1185925