DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Classification of nucleotide sequences by latent semantic analysis

Abstract

DNA sequences are analyzed using latent semantic analysis. A set of nucleotide sequences is received in which the set has a first number of sequences. A set of basis vectors is determined, in which the set has a second number of basis vectors, the second number being smaller than the first number. Each basis vector represents a specific combination of predetermined nucleotide segments. For each of the nucleotide sequences, an approximate representation of the nucleotide sequence is determined based on a combination of the basis vectors. For each pair of nucleotide sequences, a distance between the pair of nucleotide sequences is determined according the distance between the approximate representation of the pair of nucleotide sequences. The set of nucleotide sequences are classified based on the distances between the pairs of nucleotide sequences.

Inventors:
; ; ;
Issue Date:
Research Org.:
NUtech Ventures, Lincoln, NE (United States); NamesforLife, LLC, Lansing, MI (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1531990
Patent Number(s):
9659145
Application Number:
13/954,925
Assignee:
NUtech Ventures (Lincoln, NE); NamesforLife, LLC (Lansing, MI)
Patent Classifications (CPCs):
G - PHYSICS G16 - INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS G16B - BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
DOE Contract Number:  
FG02-07ER86321
Resource Type:
Patent
Resource Relation:
Patent File Date: 2013-07-30
Country of Publication:
United States
Language:
English

Citation Formats

Sayood, Khalid, Way, Sam, Nalbantoglu, Ozkan Ufuk, and Garrity, George. Classification of nucleotide sequences by latent semantic analysis. United States: N. p., 2017. Web.
Sayood, Khalid, Way, Sam, Nalbantoglu, Ozkan Ufuk, & Garrity, George. Classification of nucleotide sequences by latent semantic analysis. United States.
Sayood, Khalid, Way, Sam, Nalbantoglu, Ozkan Ufuk, and Garrity, George. Tue . "Classification of nucleotide sequences by latent semantic analysis". United States. https://www.osti.gov/servlets/purl/1531990.
@article{osti_1531990,
title = {Classification of nucleotide sequences by latent semantic analysis},
author = {Sayood, Khalid and Way, Sam and Nalbantoglu, Ozkan Ufuk and Garrity, George},
abstractNote = {DNA sequences are analyzed using latent semantic analysis. A set of nucleotide sequences is received in which the set has a first number of sequences. A set of basis vectors is determined, in which the set has a second number of basis vectors, the second number being smaller than the first number. Each basis vector represents a specific combination of predetermined nucleotide segments. For each of the nucleotide sequences, an approximate representation of the nucleotide sequence is determined based on a combination of the basis vectors. For each pair of nucleotide sequences, a distance between the pair of nucleotide sequences is determined according the distance between the approximate representation of the pair of nucleotide sequences. The set of nucleotide sequences are classified based on the distances between the pairs of nucleotide sequences.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {5}
}