Multimedia and Datasets: Providing Access to New
Forms of Nuclear Information
Slide 1: Multimedia and Datasets: Providing Access to New Forms of Nuclear Information
Brian A. Hitson
United States Department of Energy
Office of Scientific & Technical Information
Slide 2: The "Big Data" Era
A definition: "A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools." (Wikipedia)
How big is "big data"?
22,700,000 hits on Google.
Slide 3: Everybody Is On Board
• Policymakers
✦ U.S. "Big Data" Initiative - $200M (March 2012)
✦ European Commission: "Big Data – The Digital Agenda for Europe and Challenges for 2012"
• Scientists/Authors
✦ The Fourth Paradigm – Data-Intensive Scientific Discovery (2009)
✦ "Sailing on an Ocean of 0s and 1s," Science, Vol. 237 (2010)
✦ "A Deluge of Data Shapes a New Era in Computing," New York Times (14 December 2009)
• International/National bodies
✦ International Council of Science – ICSU
• World Data System
• CODATA
✦ U.S. Board on Research Data and Information (BRDI)
Slide 4: Nuclear Data
• Nuclear Data*
• Types:
✦Experimental (e.g., Experimental Nuclear Reaction Data (EXFOR))
✦Evaluated (e.g., Evaluated Nuclear Data File (ENDF-6) and Evaluated Nuclear Structure Data File – ENSDF)
✦Reaction: incident neutrons and incident charged particles and photons
✦Structure and decay data: half-lives, decay schemes, etc. (Nuclear Data Sheets)
• Other data-intensive nuclear fields:
✦Nuclear medicine
✦Radiation safety
✦Waste management and environmental research
✦Materials analysis
✦Safeguards
✦Nuclear astrophysics
Slide 5: The Challenges of Numeric Data:
- Data sets are hard to find.
http://nucleardata.nuclear.lu.se/toi/nucSearch.asp
Slide 6: The Challenges of Numeric Data:
- Data sets are hard to navigate.
Screenshot of Experimental Nuclear Reaction Data (EXFOR)
Database Version of September 21, 2012
Slide 7: The Challenges of Numeric Data:
- Data sets are hard to cite.
Slide 8: Why Cite Data?
Data should be cited in just the same way that other sources of information, such as articles and books, are cited.
Data citation can help by:
✓ enabling easy reuse and verification of data
✓ allowing the impact of data to be tracked
✓ creating a scholarly structure that recognizes and rewards data producers
Slide 9: One Solution: DataCite
What is DataCite?
» A global consortium composed of local institutions focused on improving the scholarly infrastructure around datasets and other non-textual information.
» A service for assigning Digital Object Identification (DOIs) and metadata to data sets.
Slide 10: How Data Citation Works
Data Citation metadata submitted to DOE-OSTI
Web Service API
241.6 AN
DOI Assigned By DOE-OSTI
DOE-OSTI submits nightly feed of new DOIs to DataCite
DataCite Registers DOI
- Dataset Type
- Dataset Title
- Dataset Creator/Author or Principal Investigator
- Dataset Product Number
- DOE Contract/Award Number
- Originating Research Organization
- Publication/ Issue Date
- Sponsoring Organization
- URL where the Dataset is posted for access
- Contact information
Creator/Author, Primary Investigator, or Submitter notified of Data Citation availability
Data Citation submitted to search engines for indexing
DOE-OSTI updates metadata record with DOI creating a full Data Citation
DataCite validates DOI registration with DOE-OSTI
Slide 11: Data Citation Demo
Play Demonstration of Data Citation (opens new window)
.
Slide 12: Multimedia …
… an increasing form of scientific communications
- Videotaped lectures
Slide 13: Multimedia …
… an increasing form of scientific communications
- Visualizations
Slide 14: Multimedia …
… an increasing form of scientific communications
- Experiments/Simulations
YouTube search on "nuclear" has 3,090,000 results
Slide 15: The Challenges with Multimedia Science Information
Translating ten languages, with potential for more
Lack of written transcripts, i.e. no "full text" to search
- Metadata, if available, is often minimal
- Scientific, technical, and medical terminology/vocabulary
- Videos can be long, often up to an hour or more
Slide 16: Access to Multimedia-based Science & Technology
A Case Study for Enhanced Multimedia Search & Retrieval
ScienceCinema
http://www.osti.gov/sciencecinema/
- Partnership between OSTI and Microsoft Research.
- Launched in February 2011; searches ~1,800 multimedia files.
- Utilizes Microsoft Research Audio Video Indexing System (MAVIS).
- Enables searching of digitized spoken content.
- Users can search for precise term within video and be directed to the exact point in the video where the term was spoken.
Slide 17: Multimedia Search Demo
Play Demonstration of Multimedia Search (opens new window)
Slide 18: Summary
- Big Data is here.
- Data citation makes data:
- easier to find
- easier to navigate
- Scientific multimedia is here.
- Speech indexing makes multimedia:
- easier to search
- more productive for the scientist and student
Slide 19: Thank You!
Brian A. Hitson
hitsonb@osti.gov
www.osti.gov
865-576-1199


