Semiotic indexing of digital resources
Abstract
A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.
- Inventors:
- Issue Date:
- Research Org.:
- NamesforLife LLC, East Lansing, MI (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1164666
- Patent Number(s):
- 8903825
- Application Number:
- 13/478,973
- Assignee:
- NamesforLife LLC (East Lansing, MI)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- FG02-07ER86321; FG02-04ER63933
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 2012 May 23
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 99 GENERAL AND MISCELLANEOUS; 97 MATHEMATICS AND COMPUTING
Citation Formats
Parker, Charles T., and Garrity, George M. Semiotic indexing of digital resources. United States: N. p., 2014.
Web.
Parker, Charles T., & Garrity, George M. Semiotic indexing of digital resources. United States.
Parker, Charles T., and Garrity, George M. Tue .
"Semiotic indexing of digital resources". United States. https://www.osti.gov/servlets/purl/1164666.
@article{osti_1164666,
title = {Semiotic indexing of digital resources},
author = {Parker, Charles T. and Garrity, George M.},
abstractNote = {A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {12}
}
Works referenced in this record:
High-Throughput Identification of Chemistry in Life Science Texts
book, January 2006
- Corbett, Peter; Murray-Rust, Peter; Hutchison, David
- Computational Life Sciences II, p. 107-118
A combining approach to find all taxon names (FAT)
journal, June 2006
- Sautter, Guido; Böhm, Klemens; Agosti, Donat
- Biodiversity Informatics, Vol. 3, Issue 0