LINGUISTIC ANALYSIS OF THE NUCLEOPROTEIN GENE OF INFLUENZA A VIRUS
We applied linguistic analysis approach, specifically N-grams, to classify nucleotide and amino acids sequences of nucleoprotein (NP) gene of the Influenza A virus isolated from a range of hosts and geographic regions. We considered letter frequency (1-grams), letter pairs frequency (2-grams) and triplets' frequency (3-grams). Classification trees based on 1,2,3-grams variables were constructed for the same NP nucleotide and amino acids strains and their classification efficiency were compared with the clustering obtained using phylogenetic analysis. The results have shown that disregarding positional information for a NP gene can provide the same level of recognition accuracy like alternative more complex classification techniques.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- US Department of Energy (US)
- DOE Contract Number:
- W-7405-ENG-36
- OSTI ID:
- 768970
- Report Number(s):
- LA-UR-00-2303; TRN: US200311%%234
- Resource Relation:
- Conference: Conference title not supplied, Conference location not supplied, Conference dates not supplied; Other Information: PBD: 1 May 2000
- Country of Publication:
- United States
- Language:
- English
Similar Records
Discovery of novel antiviral agents directed against the influenza A virus nucleoprotein using photo-cross-linked chemical arrays
The Antiviral Mechanism of an Influenza A Virus Nucleoprotein-Specific Single-Domain Antibody Fragment