ScienceCinema
ScienceCinema
Behrooz Chitsaz
Microsoft Research
Lorrie Apple Johnson
U.S. Department of Energy
Microsoft® Research
Multimedia Research
• Speech Search
• Face identification
• Object recognition
• Video browsing
• Semantic extraction
• (3D) Segmentation
• (3D) Image search
Speech Applications
Speech as interface
Mobile access
• Directory services
Automation
• PC application
• Web service
Text input
• Dictation
Speech as 1st class contentIndexing
• Search
• Keyword extraction
Transcription
• Meetings
• Voicemails
• Closed Caption
Translation
• Translating phone
Speech recognition
Spectral Analysis
o1..oT
Matching (Decoding)
time → alignment most likely hypothesis
W'=argmax(w1..wN)p(ot..oT|w1..wN) P(w1..wN)
(w1..wN)^
"Hello World"
Acoustic Models p(ot..oτ|phoneme)
Dictionary P(phonemes|w)
Grammar (Language Model) P(w1..wN)
MAVIS technology
• Indexing automatic transcripts as text
– Automatic transcription accuracy is only 50-80%
• MAVIS techniques
– Word-level lattice indexing
• index word alternatives – robust to recognizer errors
• 50-140% accuracy improvement
• index timing – navigate to exact point in video
– Vocabulary Adaptation
• Use NLP and Bing Search to expand word dictionary
– Automatic keywords to expose to search engines
• Enables discovery of speech content through search engines
• Bi-product of vocabulary adaptation
MAVIS Architecture
Microsoft Azure
• Store content to be processed in temporary Azure storage
• Do vocabulary adaptation using Bing
• Run recognition engine on content
• Store results or recognition process (AIB)
1. Submit audio/video RSS
2. Retrieve AIB
3. Import AIB in SQL
4. Search/Retrieve results
Web server(s)
SQL Server(s)
U.S. Department of Energy Office of Scientific and Technical Information (OSTI) Mission
• DOE invests > $10 billion/year in basic sciences, clean energy technology, and nuclear research.
• The immediate output from this investment is Information…Knowledge… R&D results
• OSTI's mission is to accelerate scientific progress by accelerating access to this information.
OSTI's Core Products
• Information Bridge
• Science Accelerator
• Science.gov
WorldWideScience.org
Emerging Forms of Scientific Information Require New Tools
• Numeric data, multimedia, and social media are emerging forms of scientific information
• Each form presents special opportunitiesand challenges
Search and Retrieval Challenges with Multimedia Science Information
• Lack of written transcripts, i.e. no "full text" to search
• Metadata, if available, is often minimal
• Scientific, technical, and medical terminology/vocabulary
• Videos can be long, often up to an hour or more
OSTI and Microsoft Research Partnership
• Video files collected from DOE's National Laboratories
• RSS feeds with metadata and URLs sent to Microsoft Research
• Audio indexing performed via MAVIS
• Audio index blob (AIB) returned to OSTI and integrated with SQL servers
• Users can search for a precise term within the video, and be directed to the exact point in the video where the term was spoken
Demonstration of ScienceCinema
Looking to the Future
• Additional content from DOE researchers
• Integration of multimedia searches into WorldWideScience.org by June
• High quality automatic closed captions
• Multilingual translation capabilities


