Title: HIV sequence compendium 2002

This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Traditionally, we present the sequence data themselves in the form of alignments: Section II, an alignment of a selection of HIV-1/SIVcpz full-length genomes (a lot of LAI-like sequences, for example, have been omitted because they are so similar that they bias the alignment); Section III, a combined HIV-1/HIV-2/SIV whole genome alignment; Sections IV–VI, amino acid alignments for HIV-1/SIV-cpz, HIV-2/SIV, and SIVagm. The HIV-2/SIV and SIVagm amino acid alignments are separate because the genetic distances between these groups are so great that presenting them in one alignment would make it very elongated because of the large number of gaps that have to be inserted. As always, tables with extensive background information gathered from the literature accompany the whole genome alignments. The collection of whole-gene sequences in the database is now large enough that we have abundant representation of most subtypes. For many subtypes, and especially for subtype B, a large number of sequences that span entire genes were notmore » included in the printed alignments to conserve space. A more complete version of all alignments is available on our website, Importantly, all these alignments have been edited to include only one sequence per person, based on phylogenetic trees that were created for all of them, as well as on the literature. Because of the number of sequences available, we have decided to use a different selection principle this year, based on the epidemiological importance of the subtypes. Subtypes A–D and CRFs 01 and 02 are by far the most widespread variants, and for these (when available) we have included 8–10 representatives in the alignments. The other subtypes and CRFs are of lesser importance, and of these 4–5 each, or as many as are available, were included. In the alignments we have also included the ‘Circulating Recombinant Forms’, mosaic genomes that have epidemiological significance. See the 1999 review of nomenclature ( for more on CRFs, and see for an overview of the patterns of known CRFs. Amino acid alignment chapters begin with an annotation table that includes sequence names, accession numbers, genomic region represented, author, and references. We have made an effort to bring the HIV-2/SIV and SIVagm alignments up-to-date as well.« less
Technical Report
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
United States