skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: LINGUISTIC ANALYSIS OF THE NUCLEOPROTEIN GENE OF INFLUENZA A VIRUS

Conference ·
OSTI ID:768970

We applied linguistic analysis approach, specifically N-grams, to classify nucleotide and amino acids sequences of nucleoprotein (NP) gene of the Influenza A virus isolated from a range of hosts and geographic regions. We considered letter frequency (1-grams), letter pairs frequency (2-grams) and triplets' frequency (3-grams). Classification trees based on 1,2,3-grams variables were constructed for the same NP nucleotide and amino acids strains and their classification efficiency were compared with the clustering obtained using phylogenetic analysis. The results have shown that disregarding positional information for a NP gene can provide the same level of recognition accuracy like alternative more complex classification techniques.

Research Organization:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
US Department of Energy (US)
DOE Contract Number:
W-7405-ENG-36
OSTI ID:
768970
Report Number(s):
LA-UR-00-2303; TRN: US200311%%234
Resource Relation:
Conference: Conference title not supplied, Conference location not supplied, Conference dates not supplied; Other Information: PBD: 1 May 2000
Country of Publication:
United States
Language:
English