Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report.
This SAND report summarizes the activities and outcomes of the Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) LDRD project, which addressed improving the accuracy of conditional random fields for named entity recognition through the use of ensemble methods. Conditional random fields (CRFs) are powerful, flexible probabilistic graphical models often used in supervised machine learning prediction tasks associated with sequence data. Specifically, they are currently the best known option for named entity recognition (NER) in text. NER is the process of labeling words in sentences with semantic identifiers such as %E2%80%9Cperson%E2%80%9D, %E2%80%9Cdate%E2%80%9D, or %E2%80%9Corganization%E2%80%9D. Ensembles are a powerful statistical inference meta-method that can make most supervised machine learning methods more accurate, faster, or both. Ensemble methods are normally best suited to %E2%80%9Cunstable%E2%80%9D classification methods with high variance error. CRFs applied to NER are very stable classifiers, and as such, would initially seem to be resistant to the benefits of ensembles. The NEEEEIT project nonetheless worked out how to generalize ensemble methods to CRFs, demonstrated that accuracy can indeed be improved by proper use of ensemble techniques, and generated a new CRF code, %E2%80%9CpyCrust%E2%80%9D and a surrounding application environment, %E2%80%9CNEEEEIT%E2%80%9D, which implement those improvements. The summary practical advice thatmore »
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Technical Report
- Research Org:
- Sandia National Laboratories, Albuquerque, NM; Sandia National Laboratories (SNL-CA), Livermore, CA (United States)
- Sponsoring Org:
- USDOE National Nuclear Security Administration (NNSA)
- Country of Publication:
- United States
Enter terms in the toolbar above to search the full text of this document for pages containing specific keywords.