Natural language information retrieval in digital libraries
Conference
·
OSTI ID:526085
- GE Corporate Research & Development, Schenectady, NY (United States)
- Rutgers Univ., New Brunswick, NJ (United States)
- New York Univ., NY (United States)
In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.
- OSTI ID:
- 526085
- Report Number(s):
- CONF-960372--; CNN: Contract 94-FI57900-000; Grant IRI-93-02615
- Country of Publication:
- United States
- Language:
- English
Similar Records
Indexing and retrieval strategies for natural language fact retrieval
A system for UNIX command retrieval using a multilayered neural network
Searching MEDLINE in English: a prototype user interface with natural language query, ranked output, and relevance feedback
Journal Article
·
Thu Sep 01 00:00:00 EDT 1983
· ACM Trans. Database Syst.; (United States)
·
OSTI ID:6439150
A system for UNIX command retrieval using a multilayered neural network
Technical Report
·
Sun Mar 31 23:00:00 EST 1991
·
OSTI ID:5739015
Searching MEDLINE in English: a prototype user interface with natural language query, ranked output, and relevance feedback
Conference
·
Sun Dec 31 23:00:00 EST 1978
· Proc. ASIS Annu. Meet.; (United States)
·
OSTI ID:5047496