by Dr. William Watson on Wed, July 28, 2010
OSTI's current services accelerate science through what is largely a kind of card file. We point people to particular pieces of literature or data that meet certain search criteria. From there, people can build on what those pieces of information tell them and achieve new discoveries and inventions.
Some of what the users achieve involves combining the information they get with other knowledge of their own that isn't represented in databases. This of course requires some thought from the users. But other achievements could result entirely from information that the users retrieve through OSTI, with no additional input whatsoever--namely inferences made directly from that information alone. Right now, such inferences still generally require user involvement. But software programs designed and tested in the last several years can automate some inferences from text and data tables. In biology and medicine, these programs have already turned up connections in the literature that could accelerate our understanding, and thus treatment, of some poorly-understood diseases. Among the most recent inferential programs is Semantic Medline, which displays conceptual interconnections across multiple search results in a single graph, thus showing the searcher how his query's terms relate to other concepts, some of which he may not already know.
If it were permanently left to unaided human users to make these inferences themselves, very few would ever be made, since no user knows every fact mentioned in the entire science and technology literature. Computers, on the other hand, can check large sets of literature for explicit links between concepts and infer chains of such links to reveal unsuspected relations in the physical world. The text-analysis software currently used can provide such inferences even though it still misses much of what human readers can pick up automatically from what they do read. Text-analysis software embodies less semantic and grammatical knowledge than human readers do, but it can apply that knowledge much faster to a larger set of literature than a human being can handle, and thus make up for its deficiency to present its user with more inferences than he would ever find from his own reading alone.
The greatest advances in this kind of science acceleration have been in biomedicine. Yet there is no reason these advances could not also be adapted to accelerate other fields as well, if the effort were made. The adaptation would require incorporating semantic information about these other fields into the text-analysis software. Success would not only accelerate the other fields, but that acceleration itself should in turn help advance medicine. The technology of modern diagnostic instruments, surgery, and medical data processing is as dependent on physical science and mathematics as on biology.
A service that presents users with inferences from a set of literature goes beyond present-day card-file services that simply direct people to sets of reports to dig into and synthesize details from on their own. An implication-inferring service can accelerate science further, by automatically pooling information from different reports and synthesizing their more obvious implications before the user digs into them--or for that matter, before anybody else has either.
To produce a "Semantic Science Accelerator" with minimum effort, we would need to
- learn as much as we can about the Semantic Medline code's design rationale, and
- get the kind of semantic information from the ETDE/INIS Thesaurus that NLM got from its Metathesaurus.
This would accelerate our own effort since we would build upon NLM's work instead of trying to reinvent it.
We would also profit by any lessons from NLM's experience. One important lesson is already evident: their significant results have been achieved incrementally by the accumulation of small accomplishments over time. Few if any of their single efforts have been huge breakthroughs in user empowerment, but their cumulative effect does amount to a breakthrough--as has been the case with other "cumulative breakthroughs" in science and technology.
OSTI already has people with the requisite knowledge and end-user perspective to develop the new semantic component.
July 27, 2010
 See, e.g., "Novel Protein-Protein Interactions Inferred from Literature Context", "Discovering Hidden Knowledge from Biomedical Literature", both available online.