The recent launch of a new multilingual search capability for international science- multilingual WorldWideScience.org (see www.science.doe.gov ) represents a significant step towards increasing connectivity and communications in global science. Hosted at the DOE Office of Scientific and Technical Information within the Office of Science, this instant access to international scientific literature acquires special significance in today's era of international science and large multi-national collaborations.
Some might say the language of science is mathematics. Others would vote for experiments and observational data. In yet another sense simulations and modeling allow predictive science. Along with these 'universal' languages of science, scientists need to communicate via the spoken and written word. With other nations increasing their investments in Science and Technology, and often publishing in their native languages, we may thus miss out on new results due to language barriers that restrict our access and search tools. Likewise, the dissemination of our science to geographically and linguistically distant colleagues is not fully successful if we are losing sections of non-English speaking readers.
Today we tend to take our easy access to information as granted. Without committing to actual years - many of us remember how difficult it used to be to access research publications in our efforts to understand the past to create science for the future. Especially difficult were the situations when we needed to find international journals and decipher publications in foreign languages. Sometimes...Read more...
On June 11, the Multilingual WorldWideScience.org BETA was officially launched in Helsinki, Finland at the International Council for Scientific and Technical Information (ICSTI) annual conference. This new capability is the result of an international public-private partnership between the WorldWideScience.org Alliance and Microsoft Research, whose translations technology has been paired with the federated searching technology of Deep Web Technologies.
WorldWideScience.org now provides the first-ever real-time searching and translation across globally-dispersed, multilingual scientific literature. Multilingual
WorldWideScience.orgBETA allows users to conduct a single query of over 70 scientific databases from around the world. Results can then be translated into the user’s preferred language. Currently, nine languages are available (Chinese, English, French, German, Japanese, Korean, Portuguese, Spanish, and Russian) and more languages will be added in the coming months. With the pace of non-English scientific publishing continuing to grow, it is vitally important that English-speaking scientists gain access to non-English content. Conversely, Multilingual WorldWideScience.orgBETA also benefits non-English-speaking users by enabling translations of English-language content.
Since its inception in 2007, WorldWideScience.org has grown from searching 12 databases in 10 countries to searching over 70 databases in 66 countries, covering more than 400 million pages of science. OSTI serves as the Operating Agent for WorldWideScience.org, and as the product manager, I have been enormously honored to lead this project over the past three years. From the beginning, the goal behind WorldWideScience.org has been to broaden access to the world’s scientific information and to facilitate the scientific discovery process. With each new database that has been added to WorldWideScience.org’s searches...Read more...
Discovery services have begun to appear in the search landscape. Discovery services provide access to documents from publishers with which they have relationships by indexing the publishers’ metadata and/or full text. Discovery services are marketed to libraries where patrons appreciate near-instantaneous search results and where library staff is willing to restrict access to sources available from the service (and optionally the library's own holdings.) While these services tout themselves as improvements to federated search, the reality is that there is no alternative to federated search for a number of important applications.
WorldWideScience.org is a global gateway to science. The federated search application was conceived and developed at OSTI and hosted by us. The portal performs live federated search of 70 databases from 66 countries. Participating members provide access to their national research databases. For a number of reasons this important gateway to millions of research documents does not lend itself to the discovery service model.
The second major challenge is that meta data does not exist for documents in many of the databases in WorldWideScience.org. Discovery services rely upon meta data to "homogenize" information about documents that they place in their...Read more...
“World Wide Science is the world’s most important scientific resource, where the global science community can share knowledge.” This remarkable encomium did not come from just any casual observer, but from a leader of one of the world’s top information organizations. While interviewing with Information World Review, Richard Boulderstone, director of e-strategy and information systems at the British Library, shared this perspective.
Boulderstone elaborated further. “It enables researchers to search over 50 national databases simultaneously and freely access high quality, authoritative information on cutting-edge scientific research. It makes available more than 360 million pages of information covering energy, medicine, agriculture and the environment, and continues to expand.”
This is an enormous compliment to everyone who has put so much hard work into creating and maintaining WorldWideScience.org. Congratulations to all.
In a world replete with information sources and options, it is imperative to offer users something unique. WorldWideScience.org (WWS.org), a federated search product that currently provides a single point of access to 61 scientific databases and portals from more than 60 countries, is a remarkably unique scientific discovery tool. Representing more than three-fourths of the world’s population, WWS.org enables access to over 400 million pages of science from around the globe. Many of the databases searched through WWS.org are not well known outside their originating countries and are not easily accessible through typical commercial search engines. In fact, a recent analysis indicated that WWS.org results, when compared to Google and Google Scholar results, were unique approximately 96.5 % of the time. Some examples of the wide range of information that a user might find on WWS.org are:
As OSTI Director Walt Warnick likes to say, today's Web is like the Model T Ford -- revolutionary but ready for vast improvement. This is especially true when it comes to making the Web work for science and technology. In that spirit I want to describe a new kind of Web Portal, one which has yet to be built. It is called the X-Portal.
An X-Portal provides comprehensive coverage for a specific science or technology community, where X refers to that community. In other words, an X-Portal for biofuels is a comprehensive biofuels portal. X = Neutron Science gives a comprehensive neutron science portal, and so on. There can be as many X-Portals as there are communities, but each has a similar design.
The need for X-Portals
The need for X-Portals is based on the fact that today's search engines and portals typically provide less than 5% coverage of any given science community. Today's Web portals and search engines, while revolutionary, are technologically immature and far from comprehensive. As a result they do relatively little to overcome the cognitive barrier of findability. One can usually find something relevant, but it is seldom the best thing out there. With 5% coverage the odds are 19 to 1 against finding the best accessible content. Moreover, if the coverage extends to a large number of other communities, as with Google, even that 5% may be swamped by hits on other communities.
There are two principal reasons for these deficiencies. First of all today's portals try to be too broad, so they wind up being shallow. This means they only capture a small fragment of any given technical community. Second, because they are so broad they cannot make use of the emerging technologies of federation, semantic analysis, mapping and visualization. These new technologies require a certain amount of analytical effort that is specific to each community. When the content is too broad these technologies are prohibitively difficult to apply.
Every scientist knows that science advances only if knowledge is shared. Mathematically, this statement implies that the advance of science is a function of both the sharing of research results, as well as doing the original research. In principle, therefore, decision makers face the problem of deciding how much to spend on original research and how much to spend on sharing the knowledge that comes out of research.
Consider the accompanying graph with the x-axis being the fraction of research resources expended on spreading knowledge. The scale would range from 0% to 100%. The y-axis is the pace of scientific discovery. One can imagine a curve plotting the pace of discovery as a function of the fraction of resources expended on sharing knowledge.
When the fraction of resources is 0%, the pace of science advance is zero, as nothing is shared. When the fraction of resources is 100%, the pace of advance is also zero, as nothing is spent on the research itself. In between these endpoints, the plot will have a maximum. The plot is the Knowledge Investment Curve.
While we show a conceptualization of the Knowledge Investment Curve, we know very little about the actual form of this curve, or even how much is currently invested in sharing.
Most knowledge sharing activities are not funded directly as budget items. These include writing an estimated one million research papers and reports a year worldwide, as well as finding and reading them. It includes preparing for and participating in conferences, as well as writing and reading emails, blogs, etc. It also includes training postdocs and Ph.D. students, plus an untold number of colleague to colleague...Read more...
A typical misconception I face when I tell people that I work within the government is that they think my job, even though it is in the technology arena, must move at a snail's pace relative to the commercial sector. This preconceived notion that our government crawls along relative to technology adoption and innovation - at least in my experience - is way off the mark.
Here at OSTI we can cite several examples where we have been on the bleeding edge of technological development. Not only have we been on the bleeding edge, in some cases we have been on that bleeding edge in cooperation with some of the largest, most innovative technology companies in the world.
For example, OSTI has been a pioneering force in federated search technology since the late 1990s. Federated search, for those of you new to the term, is the simultaneous search of multiple online databases or web resources from a single query. The Wikipedia article on federated search is an excellent resource for more information on exactly how federated search works.
Before the term "federated search" had been coined, OSTI was implementing pioneering technology that would come to be known as federated search. In April 1999, OSTI launched EnergyPortal Search, a product now encompassed in EnergyFiles. EnergyPortal Search was the first federated search application deployed by OSTI and the first product of its kind in the government. In December 2002, OSTI launched Science.gov, the first ever search capability across major science agencies. In June 2007, OSTI introduced the concept of WorldWideScience.org, which searches across national and international...Read more...
WorldWideScience provides a one-stop search engine to mine global scientific databases in the deep web
The internet has revolutionized society by changing the way people communicate, find information, and enjoy entertainment. But a standard internet search misses at least 90 percent of the information available.
The internet is separated into two unequal pools of information. The surface web contains pages of information that are utilized by popular search engines. The second pool of information is locked away in the deep web, which consists of countless databases world wide.
According to Walt Warnick, Director of the DOE Office of Scientific and Technical Information (OSTI), "The deep web is huge."
Common browsers like Google and Yahoo crawl across the thousands of internet pages on the surface web, but are unable to dig into the databases to retrieve information from the deep web.
"Asking a scientist, engineer, or educator to find information in their field using common web browsers is like asking a doctor to diagnose disease without X-rays, MRI, or any other piece of diagnostic equipment" said Warnick.
Information in the deep web can only be mined for data using search engines designed for that particular database. Many of the search engines that are available to mine databases often do not use relevance ranking, making filtering through the information a crap shoot.
"Under the current system, finding information in the deep web is a series of practical impossibilities, placing internet users, especially scientists and science educators, at a severe disadvantage" said Warnick.
To address the global science need, OSTI has launched WorldWideScience.org, a science gateway that accelerates the search for data in national and international scientific databases and portals...Read more...
Did you know that science information is available via web "mashups"? Web "mashups" combine multiple products/services into a single application for the purpose of consolidating information with an easy-to-use interface.
The Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) uses "mashups" to return search results from Science Accelerator, Science.gov, and WorldWideScience.org. These "mashups" include external sources of information, in these cases from Wikipedia and EurekAlert!, that are provided as a service to the user for help with additional background information or with the ability to further study their topic.
These "mashups" are made possible by OSTI's use of a federated search to perform all-encompassing searches of important databases and collections. Science Accelerator searches U.S. Department of Energy (DOE) databases of scientific and technical information representing billions of dollars of DOE research. Science.gov searches U.S. government agency scientific databases and web pages. WorldWideScience.org searches national and international scientific databases and portals.
Federated searching provides each of the three products with one-stop simultaneous searching of multiple networked data resources via a single query. When a query is entered, it is sent to selected databases, collections, and/or web portals that are available for searching. The individual data resources send back results, which are ranked in relevance order and are provided to the user as "mashups". Users can examine these "mashups" to find specific results that contain information that is useful to...Read more...