Take a well known area of scientific and/or technological research. Call it X-research, where X is anything you can name. X might be biofuels, cancer, dark matter, you name it. How do you find all of the people working in area X, plus all of their Web content, to build a comprehensive Web portal for them? That is the X-Portal challenge.
The X-Portal challenge is a lot harder than you might think, because research tends to be a seamless web, with no sharp boundaries. So bounding the community is the heart of the challenge. Moreover, everyone in the community is actually working on a different problem, some aspect of the general problem that defines the community. So there is no unique marker, no badge of membership.
Then too, we want to identify all the members and collect the community content without using subject matter experts. We want the computer to do it, with as little expert help as possible. We want a search engine that can identify a community and find its content. Sounds simple but this is actually a difficult problem. It may even be the next big thing for the science and technology Web: comprehensive community coverage.
Perhaps the easiest way to see the challenge is to consider a sample method or two, some simple methods that do not work. The simplest approach is to pick some key subject matter terminology for the community in question, then do a search on an existing search engine. Perhaps we might use Google or Google Scholar. Unfortunately this will not work, for several well known reasons. First, a lot of the hits will be false hits, that is, they will be on content that is not part of the community. Extensive expert assistance would be needed to say which is which, the good hits and the false. Second, a lot of the content we are after will not show up at all, because different people in the community are working on different specific problems, with different language.
A more complex approach might be to use co-authorship and citation mapping. After all, these methods are geared to the idea of a community of research. So we might first find some of the best known authors in area X, perhaps by using standard reference materials. Then we can find all the people who have co-authored research articles with these central authors. We can also find all the people who have cited their works, as well as all that have been cited by their works. We can also map all the articles that have cited their co-authors, or been cited by them. This is the kind of social network analysis that these methods were developed for, although they have never been used in this X-Portal way.
Here again we will encounter several well known obstacles. First, within a given community there are often several distinct groups of researchers, not to mention loners, and often there are many. Members of these different groups tend to cite and co-author with one another, but not with the other groups. In some cases the groups are rivals who have very little contact. So depending on where we start we might easily miss some important groups within the X community. Second, researchers often have more than one interest, or change interests during their careers, or get their ideas from outside the community. As a result these maps tend to wander away from the community we are interested in, sometimes very quickly.
These simple examples are merely intended to demonstrate that the challenge is significant, not to show that it cannot be met. In fact is quite likely that a combination of methods, including word search, citation and co-author mapping, plus several others, will meet the challenge. The trick is to find the right combination.
OSTI has been doing research into this challenge for several years, led by Dr. William Watson and a world-class team of extramural experts. See for example:
for pioneering research on the formation of scientific communities.
For more on the X-Portal see:http://www.osti.gov/ostiblog/home/entry/the_x_portal_vision_of
Dr. David Wojick is Senior Consultant on Innovation at OSTI. He also does research on the structure of scientific and technological communities.