U.S. Department of Energy Office of Science Office of Scientific and Technical Information

Multimedia and Visualization Innovations for Science



Behrooz Chitsaz
Microsoft Research
Lorrie Apple Johnson
U.S. Department of Energy
Microsoft® Research


Multimedia Research

• Speech Search
• Face identification
• Object recognition
• Video browsing
• Semantic extraction
• (3D) Segmentation
• (3D) Image search


Speech Applications

Speech as interface
Mobile access
• Directory services
• PC application
• Web service
Text input
• Dictation
Speech as 1st class content
• Search
• Keyword extraction
• Meetings
• Voicemails
• Closed Caption
• Translating phone


Speech recognition

Spectral Analysis
Matching (Decoding)
time ? alignment most likely hypothesis
W'=argmax(w1..wN)p(ot..oT|w1..wN) P(w1..wN)
"Hello World"
Acoustic Models p(ot..o?|phoneme)
Dictionary P(phonemes|w)
Grammar (Language Model) P(w1..wN)


MAVIS technology

• Indexing automatic transcripts as text
– Automatic transcription accuracy is only 50-80%
• MAVIS techniques
– Word-level lattice indexing
• index word alternatives – robust to recognizer errors
• 50-140% accuracy improvement
• index timing – navigate to exact point in video
– Vocabulary Adaptation
• Use NLP and Bing Search to expand word dictionary
– Automatic keywords to expose to search engines
• Enables discovery of speech content through search engines
• Bi-product of vocabulary adaptation
– See http://research.microsoft.com/mavis


MAVIS Architecture

Microsoft Azure

• Store content to be processed in temporary Azure storage
• Do vocabulary adaptation using Bing
• Run recognition engine on content
• Store results or recognition process (AIB)

1. Submit audio/video RSS
2. Retrieve AIB
3. Import AIB in SQL
4. Search/Retrieve results
Web server(s)
SQL Server(s)


U.S. Department of Energy Office of Scientific and Technical Information (OSTI) Mission

• DOE invests > $10 billion/year in basic sciences, clean energy technology, and nuclear research.
• The immediate output from this investment is Information…Knowledge… R&D results
• OSTI's mission is to accelerate scientific progress by accelerating access to this information.


OSTI's Core Products

• Information Bridge
• Science Accelerator
• Science.gov




Emerging Forms of Scientific Information Require New Tools

• Numeric data, multimedia, and social media are emerging forms of scientific information
• Each form presents special opportunitiesand challenges


Search and Retrieval Challenges with Multimedia Science Information

• Lack of written transcripts, i.e. no "full text" to search
• Metadata, if available, is often minimal
• Scientific, technical, and medical terminology/vocabulary
• Videos can be long, often up to an hour or more


OSTI and Microsoft Research Partnership

• Video files collected from DOE's National Laboratories
• RSS feeds with metadata and URLs sent to Microsoft Research
• Audio indexing performed via MAVIS
• Audio index blob (AIB) returned to OSTI and integrated with SQL servers
• Users can search for a precise term within the video, and be directed to the exact point in the video where the term was spoken


Demonstration of ScienceCinema


Looking to the Future

• Additional content from DOE researchers
• Integration of multimedia searches into WorldWideScience.org by June
• High quality automatic closed captions
• Multilingual translation capabilities