On October 14, OSTI announced that the People's Republic of China had joined the WorldWideScience Alliance. The press release making the announcement described, and hinted at, the importance of China's contribution:
China, a major producer of journals and conference proceedings, is offering searches of key Chinese English-language scientific literature through WorldWideScience.org. The Chinese resource enables searching of over 6,000 journals.
WorldWideScience.org, the global science gateway managed by the WorldWideScience Alliance, is intended to enhance scientific communication in order to accelerate international scientific progress by serving as a single, sophisticated point of access for diverse scientific resources and expertise from nations around the world.
The Importance of China's Participation
The addition of China is a notable milestone for a number of reasons.
China is a major global contributor to scientific knowledge. Thomson Reuters makes the point clearly:
According to citation analysis based on data from Web of Science, China is ranked second in the world by number of scientific papers published in 2007. Scientific's World IP Today Report on Global Patent Activity 2007 reported that China almost doubled its volume of patents from 2003 to 2007, and...Read more...
Compared to the pre-Web world of the early 1990s, OSTI now enables about a thousand-fold more information transactions. An information transaction occurs when the customer receives information he requested, such as delivering the results of a search or following a link clicked to display a document. But the mind-boggling growth in the number of transactions is only part of the story.
OSTI is founded on the principle that science advances only if knowledge is shared. The OSTI Corollary takes this concept to a new level. It holds that accelerating the spread of knowledge accelerates the advance of science. The advance of science can also be accelerated by funding more bright scientists. In the following blog article, Dr. Bob Marianelli reminisces and gives his perspectives about advancing science throughout his remarkable career.
Dr. Marianelli led a distinguished career as a DOE Program Manager and Director of the Chemistry Division. He had the privilege to shape and manage the process by which the Department of Energy identifies bright chemists and follows their progress. Along the way, he fostered the work of many truly extraordinary scientists, including six who went on to win the Nobel Prize, perhaps the top honor a scientist can receive. In addition to fostering the work of top scientists, Dr. Marianelli played a key role in the construction of a huge facility at Pacific Northwest National Lab, and he positively influenced the direction of other major research facilities.
Love of science and learning from an early age
SL: What inspired you to pursue a science career?
BM: Well, I was very much interested in mathematics and science from my earliest recollection even before I started school. And, my siblings - my older brother and sister - encouraged me when I was very young because they could see that I was very good with numbers. Since they were six and eight years older they would give me all kinds of help and interesting challenges....Read more...
In the first two parts to this post (Forms of STI and Forms of STI - pt. 2), I talked about how there are different forms of scientific and technical information and each is published and disseminated in its own way. OSTI has different search tools to access the different types of STI. I also discussed technical reports, journal literature, conference proceedings and papers, and e-prints. After defining each of these types of STI, I described the OSTI products that searches each. This post will finish the discussion by covering patents, project summaries, and theses/dissertations.
Patents allow the spread of information about technological inventions while protecting the property rights of the inventor. A patent issued by the U.S. Patent and Trademark Office excludes others from making, using, offering for sale, or selling the invention throughout the U.S. or importing the invention into the U.S. for a limited time in exchange for public disclosure of the invention when the patent is granted. This public disclosure is extremely important in furthering scientific research. Technology moves on, but information remains useful forever
Thomas Jefferson, an inventor himself and appointed by George Washington to the first Patent Board, was, essentially, the first patent examiner. He found that "the issue of patents for new discoveries has given a spring to invention beyond my conception." (As a graduate of the University of Virginia, I always like to work in a Jefferson quote in my writings.)
DOE and its predecessor agencies, ERDA and AEC, are responsible for creating a tremendous amount of new technology....
Related Topics: dissertations, DOE Research & Development (R&D) Accomplishments, DOE Research and Development (R&D) Project Summaries, E-Print Network (EPN), Energy Citations Database (ECD), Energy Files, Federal R&D Project Summaries, Information Bridge (IB), osti, patents, project summaries, sti, thesesRead more...
In the first part to this post, Forms of STI, I talked about how there are different forms of scientific and technical information and how each is published and disseminated in its own way. OSTI has different search tools to access the different types of STI. In the last post I discussed technical reports. Now I will cover journal literature, conference proceedings and papers, and e-prints, defining each and pointing out the OSTI search tools that covers each.
The publication of research in scientific journals started in the mid seventeenth century. Before that and for some time after, scientific and technical information was circulated via letters, printed tracts and books. Journals became a preferred medium because journal publishers worked to achieve wider dissemination and faster publication. Today, however, even with the tremendous growth in scientific journals in the later half of the twentieth century, publishing in scientific journals is most often not a speedy process. It can often take a year of more for an article to be published once it has been accepted by a journal. For this reason, many scientists and engineers also utilize other means to share their research. Options include technical reports, conference papers, pre-prints and a growing use of e-prints.
From 1948 to 1976, the Atomic Energy Commission published Nuclear Science Abstracts, providing comprehensive indexing of the international nuclear science literature, including journal literature on a worldwide basis. This literature can now be found using Energy Citations Database. ECD...
Related Topics: conference papers, conference proceedings, E-Print Network (EPN), e-prints, Energy Citations Database (ECD), Energy Files, Information Bridge (IB), journal articles, osti, Science Conference Proceedings, Science.gov, stiRead more...
Authors of DOE scientific and technical reports are getting their research results made electronically available worldwide courtesy of the Office of Scientific and Technical Information, http://www.osti.gov/.
OSTI is making research results from work performed under DOE-sponsored contracts available via an array of web based outlets including powerful federated searching products such as Science Accelerator, Science.gov, and World Wide Science.
Whenever OSTI receives a scientific and technical report from a facility doing work for the Department of Energy, OSTI processes that report and makes it publicly available. Authors listed on the report are notified that their work is publicly available and are given the URL where they can view their report. Moreover, authors receiving this notification are eligible to request reference linking that OSTI provides. Using this service, authors can request OSTI to add hyperlinks, where available, to the references at the end of their report.
Authors are encouraged to submit additional research reports to OSTI in order to increase awareness of their research activities, to provide their findings to a broad and diverse audience of potential beneficiaries, and to add to the body of scientific knowledge in their field of study.
This service is one of the many activities OSTI conducts as a part of its ongoing efforts to ensure that research results from billions of dollars of DOE sponsored research and development contracts are made available to the world's scientific community.
For additional information, contact Debbie Nuchols, email@example.com.
Today, all of OSTI's information products are on the web. This is in sharp contrast to the situation as recently as the mid-1990s, when OSTI had no products on the web.
First becoming popular in 1994, the web quickly emerged as a transformational technology, and its potential for reshaping OSTI was apparent. Recognizing the opportunity to advance the OSTI mission, OSTI set out to capitalize on it as quickly as resources would allow by producing web applications to disseminate all manner of scientific and technical information (STI). A steady progression of new OSTI products addressed the various forms of STI: technical reports, e-prints, conference proceedings, accomplishments, patents, and project descriptions . To make it easy for users who want to search through all these products at once, we introduced the DOE Science Accelerator, which is powered by our special web architecture called federated search. Reaching out beyond DOE, we initiated a collaboration with other agencies to allow users to search their R&D results along with DOE's; thus emerged Science.gov. Most recently, we took collaboration world wide by federating the best information sources from governments around the world, WorldWideScience which makes searchable about the same quantity of science as does Google.
Over the years, OSTI has upgraded each of its products, so that, today, they offer more to users than ever before. Such upgrades are made possible...Read more...
by Walt Warnick and Sol Lederman
This is the second in a three part series of articles about the deficiencies of web crawling and indexing, the superiority of federated search to the serious researcher, and the value of OSTI federated search applications in advancing science. Part 1 identified a number of serious limitations of Google and the other crawlers. This article shows how federated search overcomes these limitations. The final article in the series highlights a number of federated search applications and databases that OSTI makes available to the public.
In Part 1, we explained that Google, being a surface web crawler, cannot access the deep web, which consists of content that resides in databases. We also noted that the deep web is several hundred times larger than the surface web and that a large percent of the highly sought after scientific and technical information resides in the deep web. We also explained that there is no way to determine the quality of any particular document in the surface web. Any web citizen can post a document to the web and it will likely be indexed.
Federated search applications overcome the two aforementioned limitations of surface crawlers - (1) limited access to content, and (2) the difficulty in determining its quality. Limited access is overcome by the federated search engine's specialized knowledge of how to query a database and how to retrieve its documents. The quality concern is overcome by the complementary efforts of database owners and creators of federated search applications. First, databases that are made available to federated search applications are managed by owners, or organizations, who have criteria for...Read more...
by Walt Warnick and Sol Lederman
The web is growing.
For providing searchable access to the content that matters the most to scientists and researchers, Google and the other web crawlers can't keep up. Instead, growing numbers of scientists, researchers, and science attentive citizens turn to OSTI's federated search applications for high quality research material that Google can't find. And, given fundamental limitations on how web crawlers find content, those conducting research will derive even more benefit from OSTI's innovation and investment in federated search in the coming years.
This is the first of three articles that discuss and compare the strengths and weaknesses of two web search architectures: the crawling and indexing architecture as used today by Google and the federated search architecture used by Science.gov and WorldWideScience.org. This article points out the limitations of the crawling architecture for serious researchers. The second article explains how federated search overcomes these obstacles. The third article highlights a number of OSTI's federated search offerings that advance science, and suggests that federated search may someday become the dominant web search architecture.
Google is a "surface web" crawler; it discovers content by taking a list of known web pages and following links to new web pages and to documents. This approach finds documents that have links referencing them. It finds none of the majority of web content that is contained in the "deep web."
The deep web...Read more...
In Part 1 of this series I provided an overview of the technology that drives the E-print Network. In this article I will provide some detail about how the harvested collection, the "E-prints on Web Sites" component of the E-print Network, is constructed. In Part 3, I will discuss the technology of the portion of the E-print Network that relies on federated search of databases.
In Part 1 I explained that the E-print Network combines federated sources searched in real-time with harvested content. The harvested content, consisting of over 1.3 million e-prints, is found by directing a crawler to 28,000 web sites belonging to scientists, researchers, and members of the academic community. In OSTI terminology, harvesting is synonymous with conducting a directed crawl of web sites.
Before we look at the technology behind the harvesting, let's consider the question of why the content is harvested at all. Why not search the contributors' web sites in real-time in the same way that other collections are searched in real-time via federated search? There are several reasons for harvesting the content. First, a large number of e-prints are not found in databases. They are predominantly stored as document files in web server directories. Accessing files stored this way is the job of a web crawler, not that of a federated search engine. This is the case because, a crawler, once it locates the index page for a set of e-prints, easily harvests all e-prints referenced in that index page. The second reason...Read more...