skip to main content

DOE PAGESDOE PAGES

Title: EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Here the comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15–25% and helps curators to detect terms that would otherwise have been missed.
Authors:
 [1] ;  [2] ;  [3] ;  [4] ;  [5] ;  [1] ;  [6]
  1. Hellenic Centre for Marine Research, Crete (Greece)
  2. Helmholtz Centre for Polar and Marine Research, Bremerhaven (Germany)
  3. Delaware Biotechnology Institute, Newark, DE (United States)
  4. Max Planck Institute for Marine Microbiology, Bremen (Germany)
  5. Max Planck Institute for Marine Microbiology, Bremen (Germany); Jacob Univ. gGmbH, Bremen (Germany)
  6. Univ. of Copenhagen, Copenhagen (Denmark)
Publication Date:
OSTI Identifier:
1253360
Grant/Contract Number:
SC0010838
Type:
Accepted Manuscript
Journal Name:
Database
Additional Journal Information:
Journal Volume: 2016; Journal ID: ISSN 1758-0463
Publisher:
Oxford University Press
Research Org:
Univ. of Delaware, Newark, DE (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 59 BASIC BIOLOGICAL SCIENCES