skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Semiotic indexing of digital resources

Abstract

A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.

Inventors:
;
Publication Date:
Research Org.:
NamesforLife LLC, East Lansing, MI (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1164666
Patent Number(s):
8,903,825
Application Number:
13/478,973
Assignee:
NamesforLife LLC (East Lansing, MI) CHO
DOE Contract Number:  
FG02-07ER86321
Resource Type:
Patent
Resource Relation:
Patent File Date: 2012 May 23
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS; 97 MATHEMATICS AND COMPUTING

Citation Formats

Parker, Charles T, and Garrity, George M. Semiotic indexing of digital resources. United States: N. p., 2014. Web.
Parker, Charles T, & Garrity, George M. Semiotic indexing of digital resources. United States.
Parker, Charles T, and Garrity, George M. Tue . "Semiotic indexing of digital resources". United States. doi:. https://www.osti.gov/servlets/purl/1164666.
@article{osti_1164666,
title = {Semiotic indexing of digital resources},
author = {Parker, Charles T and Garrity, George M},
abstractNote = {A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Dec 02 00:00:00 EST 2014},
month = {Tue Dec 02 00:00:00 EST 2014}
}

Patent:

Save / Share:

Works referenced in this record:

High-Throughput Identification of Chemistry in Life Science Texts
book, January 2006

  • Corbett, Peter; Murray-Rust, Peter; Hutchison, David
  • Computational Life Sciences II, p. 107-118
  • DOI: 10.1007/11875741_11