skip to main content

SciTech ConnectSciTech Connect

Title: LINGUISTIC ANALYSIS OF THE NUCLEOPROTEIN GENE OF INFLUENZA A VIRUS

We applied linguistic analysis approach, specifically N-grams, to classify nucleotide and amino acids sequences of nucleoprotein (NP) gene of the Influenza A virus isolated from a range of hosts and geographic regions. We considered letter frequency (1-grams), letter pairs frequency (2-grams) and triplets' frequency (3-grams). Classification trees based on 1,2,3-grams variables were constructed for the same NP nucleotide and amino acids strains and their classification efficiency were compared with the clustering obtained using phylogenetic analysis. The results have shown that disregarding positional information for a NP gene can provide the same level of recognition accuracy like alternative more complex classification techniques.
Authors:
;
Publication Date:
OSTI Identifier:
768970
Report Number(s):
LA-UR-00-2303
TRN: US200311%%234
DOE Contract Number:
W-7405-ENG-36
Resource Type:
Conference
Resource Relation:
Conference: Conference title not supplied, Conference location not supplied, Conference dates not supplied; Other Information: PBD: 1 May 2000
Research Org:
Los Alamos National Lab., NM (US)
Sponsoring Org:
US Department of Energy (US)
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; ACCURACY; AMINO ACIDS; CLASSIFICATION; EFFICIENCY; GENES; INFLUENZA; NUCLEOPROTEINS; NUCLEOTIDES; STRAINS; TREES; TRIPLETS