skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DHS-STEM Internship at Lawrence Livermore National Laboratory

Technical Report ·
DOI:https://doi.org/10.2172/945740· OSTI ID:945740

This summer I had the fortunate opportunity through the DHS-STEM program to attend Lawrence Livermore National Laboratories (LLNL) to work with Tom Slezak on the bioinformatics team. The bioinformatics team, among other things, helps to develop TaqMan and microarray probes for the identification of pathogens. My main project at the laboratory was to test such probe identification capabilities against metagenomic (unsequenced) data from around the world. Using various sequence analysis tools (Vmatch and Blastall) and several we developed ourselves, about 120 metagenomic sequencing projects were compared against a collection of all completely sequenced genomes and Lawrence Livermore National Laboratory's (LLNL) current probe database. For the probes, the Blastall algorithms compared each individual metagenomic project using various parameters allowing for the natural ambiguities of in vitro hybridization (mismatches, deletions, insertions, hairpinning, etc.). A low level cutoff was used to eliminate poor sequence matches, and to leave a large variety of higher quality matches for future research into the hybridization of sequences with mutations and variations. Any hits with at least 80% base pair conservation over 80% of the length of the match. Because of the size of our whole genome database, we utilized the exact match algorithm of Vmatch to quickly search and compare genomes for exact matches with varying lower level limits on sequence length. I also provided preliminary feasibility analyses to support a potential industry-funded project to develop a multiplex assay on several genera and species. Each genus and species was evaluated based on the amount of sequenced genomes, amount of near neighbor sequenced genomes, presence of identifying genes--metabolistic or antibiotic resistant genes--and the availability of research on the identification of the specific genera or species. Utilizing the bioinformatic team's software, I was able to develop and/or update several TaqMan probes for these and develop a plan of identification for the more difficult ones. One suggestion for a genus with low conservation was to separate species into several groups and look for probes within these and then use a combination of probes to identify a genus. This has the added benefit of also providing subgenus identification in larger genera. During both projects I had developed a set of computer programs to simplify or consolidate several processes. These programs were constructed with the intent of being reused to either repeat these results, further this research, or to start a similar project. A big problem in the bioinformatic/sequencing field is the variability of data storage formats which make using data from various sources extremely difficult. Excluding for the moment the many errors present in online database genome sequences, there are still many difficulties in converting one data type into another successfully every time. Dealing with hundreds of files, each hundreds of megabytes, requires automation which in turn requires good data mining software. The programs I developed will help ease this issue and make more genomic sources available for use. With these programs it is extremely easy to gather the data, cleanse it, convert it and run it through some analysis software and even analyze the output of this software. When dealing with vast amounts of data it is vital for the researcher to optimize the process--which became clear to me with only ten weeks to work with. Due to the time constraint of the internship, I was unable to finish my metagenomic project; I did finish with success, my second project, discovering TaqMan identification for genera and species. Although I did not complete my first project I made significant findings along the way that suggest the need for further research on the subject. I found several instances of false positives in the metagenomic data from our microarrays which indicates the need to sequence more metagenomic samples. My initial research shows the importance of expanding our known metagenomic world; at this point there is always the likelihood of developing probes with unknown interactions because there is not enough sequencing. On the other hand my research did point out the sensitivity and quality of LLNL's microarrays when it identified a parvoviridae infection in a mosquito metagenomic sample from southern California. It also uniquely identified the presence of several species of the adenovirus which could mean that there was some archaic strain of the adenovirus present in the metagenomic sample or there was a contamination in the sample, requiring a further investigation to clarify.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
945740
Report Number(s):
LLNL-TR-406492; TRN: US200904%%137
Country of Publication:
United States
Language:
English