U.S. Department of Energy Office of Science Office of Scientific and Technical Information

Multimedia and Datasets: Providing Access to New Forms of Nuclear Information

Slide01
Slide01

Multimedia and Datasets: Providing Access to New Forms of Nuclear Information

 

Brian A. Hitson

United States Department of Energy

Office of Scientific and Technical Information

Slide02
Slide02

The "Big Data" Era

 

A definition: "A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools." (Wikipedia)

 

How big is "big data"?

 

22,700,000 hits on Google.

Slide03
Slide03

Everybody Is On Board

 

  • Policymakers
    • U.S. "Big Data" Initiative - $200M (March 2012)
    • European Commission: "Big Data – The Digital Agenda for Europe and Challenges for 2012"
  • Scientists/Authors
    • The Fourth Paradigm – Data-Intensive Scientific Discovery (2009)
    • "Sailing on an Ocean of 0s and 1s," Science, Vol. 237 (2010)
    • "A Deluge of Data Shapes a New Era in Computing," New York Times (14 December 2009)
  • International/National bodies
    • International Council of Science – ICSU
      • World Data System
      • CODATA
    • U.S. Board on Research Data and Information (BRDI)
Slide04
Slide04

Nuclear Data

 

  • Nuclear Data*
    • Types:
      • Experimental (e.g., Experimental Nuclear Reaction Data (EXFOR))
      • Evaluated (e.g., Evaluated Nuclear Data File (ENDF-6) and Evaluated Nuclear Structure Data File – ENSDF)
      • Reaction: incident neutrons and incident charged particles and photons
      • Structure and decay data: half-lives, decay schemes, etc. (Nuclear Data Sheets)
    • Other data-intensive nuclear fields:
      • Nuclear medicine
      • Radiation safety
      • Waste management and environmental research
      • Materials analysis
      • Safeguards
      • Nuclear astrophysics

 

* Source: Nuclear Data Section, IAEA, 2000

Slide05
Slide05

The Challenges of Numeric Data:

 

• Data sets are hard to find.

 

http://nucleardata.nuclear.lu.se/toi/nucSearch.asp

Slide06
Slide06

The Challenges of Numeric Data:

 

• Data sets are hard to navigate.

 

Screenshot of Experimental Nuclear Reaction Data (EXFOR) Database Version of September 21, 2012

Slide07
Slide07

The Challenges of Numeric Data:

 

• Data sets are hard to cite.

Slide08
Slide08

Why Cite Data?

 

Data should be cited in just the same way that other sources of information, such as articles and books, are cited.

Data citation can help by:

  • enabling easy reuse and verification of data
  • allowing the impact of data to be tracked
  • creating a scholarly structure that recognizes and rewards data producers
Slide09
Slide09

One Solution: DataCite

 

What is DataCite?

» A global consortium composed of local institutions focused on improving the scholarly infrastructure around datasets and other non-textual information.

» A service for assigning Digital Object Identification (DOIs) and metadata to data sets.

Slide10
Slide10

How Data Citation Works

 

Data Citation metadata submitted to DOE-OSTI

Web Service API

241.6 AN

DOI Assigned By DOE-OSTI

DOE-OSTI submits nightly feed of new DOIs to DataCite

DataCite Registers DOI

  • Dataset Type
  • Dataset Title
  • Dataset Creator/Author or Principal Investigator
  • Dataset Product Number
  • DOE Contract/Award Number
  • Originating Research Organization
  • Publication/ Issue Date
  • Sponsoring Organization
  • URL where the Dataset is posted for access
  • Contact information

Creator/Author, Primary Investigator, or Submitter notified of Data Citation availability

Data Citation submitted to search engines for indexing

DOE-OSTI updates metadata record with DOI creating a full Data Citation

DataCite validates DOI registration with DOE-OSTI

Slide12
Slide12

Multimedia …

 

… an increasing form of scientific communications

• Videotaped lectures

Slide13
Slide13

Multimedia …

 

… an increasing form of scientific communications

• Visualizations

Slide14
Slide14

Multimedia …

 

… an increasing form of scientific communications

• Experiments/Simulations

YouTube search on "nuclear" has 3,090,000 results

Slide15
Slide15

The Challenges with Multimedia Science Information

 

  • Lack of written transcripts, i.e. no "full text" to search
  • Metadata, if available, is often minimal
  • Scientific, technical, and medical terminology/vocabulary
  • Videos can be long, often up to an hour or more
Slide16
Slide16

Access to Multimedia-based Science & Technology

 

A Case Study for Enhanced Multimedia Search & Retrieval

ScienceCinema

http://www.osti.gov/sciencecinema/

  • Partnership between OSTI and Microsoft Research.
  • Launched in February 2011; searches ~2,600 multimedia files from DOE and CERN.
  • Utilizes Microsoft Research Audio Video Indexing System (MAVIS).
  • Enables searching of digitized spoken content.
  • Users can search for precise term within video and be directed to the exact point in the video where the term was spoken.
Slide18
Slide18

Summary

 

  • Big Data is here.
  • Data citation makes data:
    • easier to find
    • easier to navigate
  • Scientific multimedia is here.
  • Speech indexing makes multimedia:
    • easier to search
    • more productive for the scientist and student
Slide19
Slide19

Thank You!

Brian A. Hitson
hitsonb@osti.gov
www.osti.gov
865-576-1199