Methods and apparatuses for information analysis on shared and distributed computing systems
Abstract
Apparatuses and computer-implemented methods for analyzing, on shared and distributed computing systems, information comprising one or more documents are disclosed according to some aspects. In one embodiment, information analysis can comprise distributing one or more distinct sets of documents among each of a plurality of processes, wherein each process performs operations on a distinct set of documents substantially in parallel with other processes. Operations by each process can further comprise computing term statistics for terms contained in each distinct set of documents, thereby generating a local set of term statistics for each distinct set of documents. Still further, operations by each process can comprise contributing the local sets of term statistics to a global set of term statistics, and participating in generating a major term set from an assigned portion of a global vocabulary.
- Inventors:
-
- Richland, WA
- Issue Date:
- Research Org.:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1015195
- Patent Number(s):
- 7895210
- Application Number:
- US Patent Application 11/540,240
- Assignee:
- Battelle Memorial Institute (Richland, WA)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- AC05-76RL01830
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Bohn, Shawn J, Krishnan, Manoj Kumar, Cowley, Wendy E, and Nieplocha, Jarek. Methods and apparatuses for information analysis on shared and distributed computing systems. United States: N. p., 2011.
Web.
Bohn, Shawn J, Krishnan, Manoj Kumar, Cowley, Wendy E, & Nieplocha, Jarek. Methods and apparatuses for information analysis on shared and distributed computing systems. United States.
Bohn, Shawn J, Krishnan, Manoj Kumar, Cowley, Wendy E, and Nieplocha, Jarek. Tue .
"Methods and apparatuses for information analysis on shared and distributed computing systems". United States. https://www.osti.gov/servlets/purl/1015195.
@article{osti_1015195,
title = {Methods and apparatuses for information analysis on shared and distributed computing systems},
author = {Bohn, Shawn J and Krishnan, Manoj Kumar and Cowley, Wendy E and Nieplocha, Jarek},
abstractNote = {Apparatuses and computer-implemented methods for analyzing, on shared and distributed computing systems, information comprising one or more documents are disclosed according to some aspects. In one embodiment, information analysis can comprise distributing one or more distinct sets of documents among each of a plurality of processes, wherein each process performs operations on a distinct set of documents substantially in parallel with other processes. Operations by each process can further comprise computing term statistics for terms contained in each distinct set of documents, thereby generating a local set of term statistics for each distinct set of documents. Still further, operations by each process can comprise contributing the local sets of term statistics to a global set of term statistics, and participating in generating a major term set from an assigned portion of a global vocabulary.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2011},
month = {2}
}
Works referenced in this record:
HNC's MatchPlus system
journal, October 1992
- Gallant, Stephen I.; Caid, William R.; Carleton, Joel
- ACM SIGIR Forum, Vol. 26, Issue 2, p. 34-38
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
journal, May 2006
- Nieplocha, Jarek; Palmer, Bruce; Tipparaju, Vinod
- The International Journal of High Performance Computing Applications, Vol. 20, Issue 2