skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: EXSCLAIM!

Abstract

Due to recent improvements in image resolution and acquisition speed, materials microscopy is experiencing an explosion of published imaging data. The standard publication format, while sufficient for traditional data ingestion scenarios where a select number of images can be critically examined and curated manually, is not conducive tolarge-scale data aggregation or analysis, hindering data sharing and reuse. Most images in publications are presented as components of a larger figure with their explicit context buried in the main body or caption text, so even if aggregated, collections of images with weak or no digitized contextual labels have limited value. To solve the problem of curating labeled microscopy data from literature, we introduce the EXSCLAIM! Python toolkit for the automatic EXtraction, Separation, and Caption-based natural Language Annotation of IMages from scientific literature. The software is implemented through a three part pipeline: the JournalScraper, which searches the web and downloads figures and captions based on a user provided query, the CaptionDistributor, which separates caption text based on the subfigure each portion of the caption refers to, and the FigueSeparator, which separates figures into component subfigures and extracts other visual information. Also included is a Django user interface for exploring the resulting dataset.

Developers:
; ; ;
Release Date:
Project Type:
Open Source, Publicly Available Repository
Software Type:
Scientific
Licenses:
GNU General Public License v3.0
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities Division

Primary Award/Contract Number:
AC02-06CH11357
Code ID:
72658
Site Accession Number:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Country of Origin:
United States

Citation Formats

CHAN, MARIA, SPREADBURY, TREVOR JOSEPH, SCHWENKER, ERIC S, JIANG, WEIXIN, and USDOE Office of Science. EXSCLAIM!. Computer software. https://www.osti.gov//servlets/purl/1862008. USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities Division. 8 Apr. 2022. Web. doi:10.11578/dc.20220408.1.
CHAN, MARIA, SPREADBURY, TREVOR JOSEPH, SCHWENKER, ERIC S, JIANG, WEIXIN, & USDOE Office of Science. (2022, April 8). EXSCLAIM! [Computer software]. https://www.osti.gov//servlets/purl/1862008. https://doi.org/10.11578/dc.20220408.1
CHAN, MARIA, SPREADBURY, TREVOR JOSEPH, SCHWENKER, ERIC S, JIANG, WEIXIN, and USDOE Office of Science. EXSCLAIM!. Computer software. April 8, 2022. https://www.osti.gov//servlets/purl/1862008. doi:https://doi.org/10.11578/dc.20220408.1.
@misc{osti_1862008,
title = {EXSCLAIM!},
author = {CHAN, MARIA and SPREADBURY, TREVOR JOSEPH and SCHWENKER, ERIC S and JIANG, WEIXIN and USDOE Office of Science},
abstractNote = {Due to recent improvements in image resolution and acquisition speed, materials microscopy is experiencing an explosion of published imaging data. The standard publication format, while sufficient for traditional data ingestion scenarios where a select number of images can be critically examined and curated manually, is not conducive tolarge-scale data aggregation or analysis, hindering data sharing and reuse. Most images in publications are presented as components of a larger figure with their explicit context buried in the main body or caption text, so even if aggregated, collections of images with weak or no digitized contextual labels have limited value. To solve the problem of curating labeled microscopy data from literature, we introduce the EXSCLAIM! Python toolkit for the automatic EXtraction, Separation, and Caption-based natural Language Annotation of IMages from scientific literature. The software is implemented through a three part pipeline: the JournalScraper, which searches the web and downloads figures and captions based on a user provided query, the CaptionDistributor, which separates caption text based on the subfigure each portion of the caption refers to, and the FigueSeparator, which separates figures into component subfigures and extracts other visual information. Also included is a Django user interface for exploring the resulting dataset.},
url = {https://www.osti.gov//servlets/purl/1862008},
doi = {10.11578/dc.20220408.1},
url = {https://www.osti.gov/biblio/1862008}, year = {Fri Apr 08 00:00:00 EDT 2022},
month = {Fri Apr 08 00:00:00 EDT 2022},
note =
}