skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation

Journal Article · · Database
 [1];  [2];  [3];  [4];  [5];  [1];  [6]
  1. Hellenic Centre for Marine Research, Crete (Greece)
  2. Helmholtz Centre for Polar and Marine Research, Bremerhaven (Germany)
  3. Delaware Biotechnology Institute, Newark, DE (United States)
  4. Max Planck Institute for Marine Microbiology, Bremen (Germany)
  5. Max Planck Institute for Marine Microbiology, Bremen (Germany); Jacob Univ. gGmbH, Bremen (Germany)
  6. Univ. of Copenhagen, Copenhagen (Denmark)

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Here the comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15–25% and helps curators to detect terms that would otherwise have been missed.

Research Organization:
Univ. of Delaware, Newark, DE (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0010838
OSTI ID:
1253360
Journal Information:
Database, Vol. 2016; ISSN 1758-0463
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 21 works
Citation information provided by
Web of Science

References (27)

The genomic standards consortium: bringing standards to life for microbial ecology journal April 2011
The environment ontology: contextualising biological and biomedical entities journal January 2013
BRENDA in 2015: exciting developments in its 25th year of existence journal November 2014
Uberon, an integrative multi-species anatomy ontology journal January 2012
Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data journal October 2014
The mammalian phenotype ontology: enabling robust annotation and comparative analysis journal November 2009
LINNAEUS: A species name identification system for biomedical literature journal January 2010
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text journal June 2013
Anatomical Entity Recognition with a Hierarchical Framework Augmented by External Resources journal October 2014
Comprehensive comparison of large-scale tissue expression datasets journal January 2015
An overview of MetaMap: historical perspective and recent advances journal May 2010
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications journal September 2010
DNorm: disease name normalization with pairwise learning to rank journal August 2013
DISEASES: Text mining and data integration of disease–gene associations journal March 2015
ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life journal January 2015
Reflect: augmented browsing for the life scientist journal June 2009
The Encyclopedia of Life v2: Providing Global Access to Knowledge About Life on Earth journal April 2014
The Pacific Ocean Virome (POV): A Marine Viral Metagenomic Dataset and Associated Protein Clusters for Quantitative Viral Ecology journal February 2013
Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts journal January 2012
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles journal January 2014
Egas: a collaborative and interactive document curation platform journal January 2014
Text-mining-assisted biocuration workflows in Argo journal January 2014
VIROME: a standard operating procedure for analysis of viral metagenome sequences journal July 2012
Gene Ontology: tool for the unification of biology journal May 2000
DISEASES: Text mining and data integration of disease???gene associations preprint August 2014
The gene normalization task in BioCreative III journal October 2011
Comprehensive comparison of large-scale tissue expression datasets posted_content May 2015

Cited By (14)

Identifying bacterial biotope entities using sequence labeling: Performance and feature analysis journal May 2018
Accelerating annotation of articles via automated approaches: evaluation of the neXtA5 curation-support tool by neXtProt journal January 2018
ezTag: tagging biomedical concepts via interactive learning journal May 2018
Applying Citizen Science to Gene, Drug, Disease Relationship Extraction from Biomedical Abstracts journal February 2019
MER: a shell script and annotation server for minimal named entity recognition and linking journal December 2018
Design, implementation, and operation of a rapid, robust named entity recognition web service journal March 2019
SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data journal January 2016
MER: a Shell Script and Annotation Server for Minimal Named Entity Recognition and Linking text January 2018
SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data journal January 2016
Crowdsourcing and curation: perspectives from biology and natural language processing journal January 2016
Overview of the interactive task in BioCreative V journal January 2016
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges journal January 2016
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements journal October 2016
EXTRACT 2.0: text-mining-assisted interactive annotation of biomedical named entities and ontology terms preprint February 2017

Similar Records

Metazen – metadata capture for metagenomes
Journal Article · Mon Dec 08 00:00:00 EST 2014 · Standards in Genomic Sciences · OSTI ID:1253360

BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics
Technical Report · Sat Oct 29 00:00:00 EDT 2016 · OSTI ID:1253360

The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data
Journal Article · Tue Mar 30 00:00:00 EDT 2010 · Standards in Genomic Sciences · OSTI ID:1253360