Multilingual WorldWideScience:
Accelerating Discovery through Multilingual Translations
Slide 1: Multilingual WorldWideScience:
Accelerating Discovery through Multilingual Translations
International Council for Scientific and Technical Information (ICSTI) Annual Conference
June 2010, Helsinki, Finland
Walter L. Warnick, Ph.D.
Director
Office of Scientific & Technical Information
U. S. Department of Energy
Slide 2: Science Advances Only if Knowledge is Shared
"If I have seen further it is only by standing on the shoulders of giants." Sir Isaac Newton
Corollary 1: Scientific discovery can be accelerated by accelerating access to worldwide scientific information.
The case for WorldWideScience.org.
Corollary 2: Multilingual translations of science will further accelerate scientific discovery.
The case for Multilingual WorldWideScience.org
Slide 3: The "Accelerating" Power of WorldWideScience.org
Overcoming the researcher’s practical limitations:
- Not knowing "what’s out there." (examples: Korean medical journals, Australian Antarctic data, South African scientific research database)
- Inadequate time to search scientific databases one by one. (examples: UK PubMed Central, Ginsparg’s arXiv.org) Inability to sort compiled results by relevance.
By filling these gaps, WorldWideScience.org has accelerated access to scientific information.
Slide 4: Brief History: Federated Search and WorldWideScience.org
Deep Web
- where science is
- hundreds of times larger than the "surface web"
- generally not "googleable," or searchable, by major search engines
Slide 5: Deep Web Solution: Federated Searching
- A single user query simultaneously sent to multiple deep web databases.
- Federated search engine sorts and presents results in relevance-ranked order.
- Overcomes the 3 practical limitations.
- No burden on individual database "owners."
Slide 6: Federated Search Examples
- Science.gov – searches across all U.S. federal science agencies' databases (200 million pages)
- Similar – but different -- experiences outside science:
- Kayak.com – "compare hundreds of travel sites at once"
- Pricegrabber.com – comparison shopping across multiple merchants
Slide 7: Global Federated Search
- Taking the Science.gov model global – WorldWideScience.org
- Initial partnership between U.S. Department of Energy and the British Library (2007)
Slide 8: Global Federated Search
- Transition to multilateral governance (WorldWideScience Alliance) and ICSTI sponsorship (2008)
Slide 9: WorldWideScience – Facts and Figures
- Tremendous growth in search content: from 10 nations to 65 nations in 3 years
- > 400 million pages
- From well-known sources: e.g., PubMed, CERN, KoreaScience
- To more obscure sources: e.g., Bangladesh Journals Online
Slide 10: WorldWideScience – Fills Key Niche in Scientific Discovery

- In comparison of search results from identical queries on WWS, Google, and Google Scholar, only 3.5% overlap (i.e., WorldWideScience is 96.5% unique)
Slide 11: Now, the case for Multilingual WorldWideScience.org …
Slide 12: Consider this …
While English is the lingua franca for science, these are the world's most widely spoken languages:
| Rank | Language | Estimated Number of Speakers |
|---|---|---|
| 1 | Mandarin Chinese | 1,051,000,000 |
| 2 | English | 510,000,000 |
| 3 | Hindi/Urdu | 490,000,000 |
| 4 | Spanish | 429,000,000 |
| 5 | Arabic | 280,000,000 |
| 6 | Russian | 255,000,000 |
| 7 | Portuguese | 230,000,000 |
| 8 | German | 229,000,000 |
| 9 | Bengali | 215,000,000 |
| 10 | French | 130,000,000 |
| 11 | Japanese | 127,000,000 |
| (Source: Wikipedia) | ||
Slide 13: Increasing Globalization of Science Calls for Multilingual Search Capabilities …
- Is there Science beyond English? Initiatives to increase the quality and visibility of non-English publications might help to break down language barriers in scientific communication (Meneghini and Packer, Nature, 2007)
- Science's Language Problem: As globalization increases, communication between linguistic communities could become a serious stumbling block (Barany, Business Week, 2005)
- Science on the Rise in Developing Countries (Holmgren and Schnitzer, PLoS Biology, 2004)
Slide 14: Of the world’s "top 400" institutional repositories, 250, or 63%, have some or all non-English content.
Examples:
- HAL CNRS -- French
- Kyoto University Research Repository – Japanese
- Leiden University Digital Repository -- Dutch
- CSIC (Spanish National Research Council)
Slide 15: Major Non-English Science "Producers"
Slide 16: Screen capture of http://www.istic.ac.cn/ (China)
Slide 17: Screen capture of http://science.viniti.ru/index.php?option=com_search&Itemid=27/ eLIBRARY.RU (Russia)
Slide 18:
- Japan
- France
- Germany
- Brazil … and many other countries.
Slide 19: To further accelerate access to science, multilingual translations are needed in both directions:
- Translation of English content for non-English speakers … and …
- Translation of non-English content for English speakers
Slide 20:
- Up until now, real-time translation of science has been limited.
- Generally limited to translating from one language into another single language at one time.
- Not deployed on deep web scientific databases.
- Results less than perfect with complex scientific language (note that it's still not perfect but is constantly improving)
Slide 21:
Now, we have the essential ingredients for real-time translation of science
- National science databases in multiple languages
- Federated search
- Multilingual translation on both front and back end of the user experience
A public-private partnership, introduced as Multilingual WorldWideScience.orgBeta
WorldWideScience Alliance
Translations powered by Microsoft® Translator
by Deep Web Technologies
Enabling Science and Innovation ICSTI International Council for Scientific and Technical Information
Slide 22:
Here’s how it works …- A Chinese scientist submits a query in Chinese to Multilingual WorldWideScience.org.
- MWWS.org uses Microsoft to translate the Chinese query into individual languages of source databases (English, French, Portuguese, Russian, etc.)
- MWWS.org sends the translated queries to corresponding databases, which search their contents and return results in native languages to MWWS.org.
- MWWS.org uses Microsoft to translate native language results into Chinese and presents results to the user in relevance-ranked order.
Conversely, an English-speaking user could have a query translated into languages of non-English databases and then get results back in English.
Slide 23: Demonstration
Slide 24:
With the launch of Multilingual WorldWideScience.org, we are …
- Opening vast reservoirs of heretofore under-utilized scientific knowledge
- Providing equal access to science for anyone on the Internet
- Promoting scientific collaboration, participation, and transparency
… and accelerating scientific discovery!


