In the OSTI Collections: Genomics
Dr. William N. Watson, Physicist
DOE Office of Scientific and Technical Information
Energy and environment
- Prochlorococcus, Vibrio, Shewanella
- New technology
- Structural genomics
- Reports Available through OSTI's SciTech Connect
- Reports Available through OSTI's DOepatents
- Additional References
Genomics is a subfield of genetics that deals with an organism’s genome, or complete set of genetic material. This material, the DNA, consists of a few components—the same few in every organism—whose exact sequence is unique to the cells of each organism. This sequence represents for the organism what an operating system represents for a machine: a master program that sets the parameters of the machine’s (or organism’s) functioning, which include, for the case of an organism, parameters for growth and development. Some portions of the program directly indicate the structure of molecules (proteins, RNA) that the organism synthesizes to construct itself or carry out its functions; cells assemble proteins based on RNA copies of DNA segments that describe those proteins. Other portions of the program specify the conditions under which the various synthesizing processes are turned on or off. (See Figure 1.)
Genomics has an obvious logical connection to branches of biology like medicine and agriculture, but less obviously it relates to their subbranches that pertain to clean energy generation and the characterization and cleanup of our environment. As the U.S. Department of Energy is concerned with these, it sponsors a great deal of genomics research. Genomics is intrinsically related to some problems (e.g., how an organism functions or what the population dynamics are in some ecosystem) and extrinsically to others (e.g., how to produce or transform some substance when living organisms might be able to produce or transform it on the required scale). Reports issued in the last several months, available through SciTech Connect, suggest the breadth of this genomics research and the variety of its applications.
Figure 1. Sequences of a cell’s DNA components are transcribed by the cell to messenger RNA (mRNA), which the cell later translates into corresponding sequences of amino acids that constitute proteins, whose presence or absence in the cell affects the transcription of DNA data into mRNA sequences. (From p. 7, “2012 q-bio Summer School: QB1 - Stochastic Gene Regulation”[SciTech Connect], Los Alamos National Laboratory.)
While it doesn’t represent the entire Energy Department effort in genomics, the 2013 Progress Report[SciTech Connect] of the Energy Department’s Joint Genome Institute (JGI) does provide much information about it. This institute provides facilities for researchers to determine genomes’ component sequences[Wikipedia] and computationally analyze them. Page 9 of the report describes the institute’s mission areas:
- Bioenergy … Sequencing projects at the DOE JGI that contribute to meeting this goal focus on one of three categories: developing plants that can be used as feedstocks for biofuel production, characterizing enzymes from fungi and microbes to break down the lignin and cellulose in plant walls, and identifying microorganisms that can photosynthesize or ferment sugars into biofuels.
- Carbon Cycle … The carbon cycle is heavily dependent on the microbes that process and fix atmospheric carbon, promoting plant growth and degrading organic material. As microbes constitute the largest component of the Earth’s biodiversity, understanding how they metabolize carbon, and how environmental changes affect these processes, is crucial. The DOE JGI is sequencing large numbers of microbes and microbial communities that contribute to carbon cycling. With this information, researchers can develop better predictive models that could provide more effective contributions toward reducing the effects of increasing carbon dioxide emissions on the global climate.
- Biogeochemistry The carbon cycle is not the only process that regulates the natural environment …. Microbes and microbial communities that can degrade or otherwise transform environmental contaminants such as toxic chemicals or heavy metals are another area of focus for the DOE JGI.
Some of the specific problems in these areas, and how genome data is involved in solving them, are indicated by the following Institute accomplishments in genome data acquisition:
- Participation in the sequencing of the simplest cotton genome, that of the species Gossypium raimondii. Aside from the significance of cotton for existing industry worldwide (U.S. production and processing alone contributes about $27 billion and 200,000 jobs to the economy), cotton bolls’ being almost pure cellulose makes them a potentially important source of biofuel. Understanding gene function in the biosynthesis of cellulose is “fundamental to improved biofuels production” according to Jeremy Schmutz, head of the institute’s Plant Program who led this effort. The G. raimondii genome is to be the foundation for sequencing the genome of G. hirsutum, which makes up most of the worldwide field crop.
- Sequencing the genome of a strain of Emiliania huxleyi algae (“Ehux”) and comparing it with 12 other strains’ genomes. Ehux algae, which surround themselves with self-produced calcium carbonate shells in a process that releases carbon dioxide, also trap organic carbon derived from carbon dioxide. The Ehux genome comparison showed that different individuals possess a shared core of genes supplemented by different sets of genes that are thought to be useful in dealing with the individuals’ local environments. The variability found “helps explain the alga’s ability to thrive in oceans from the equator to the subarctic and cause algal blooms in the spring and summer that can cover several hundred thousand square kilometers” (areas of ocean on the order of land areas like California or Texas). Some of the core genes let Ehux thrive in low phosphorus levels and assimilate and break down nitrogen-rich compounds. Since the study also indicated that Ehux can produce a compound that can influence cloud formation, thus influencing climate, researchers would like to investigate how much of this compound Ehux produces and under what conditions.
- Participating in the DNA sequencing of individual microbes from the saline Deep Lake in the Vestfold Hills of Antarctica, and comparing the sequence data with microbial community information sampled at various depths of the lake. Deep Lake is salty enough that it never freezes, despite temperatures 20 degrees Celsius below the freezing point of pure water at standard atmospheric pressure. According to team leader Rick Cavicchioli (University of New South Wales, Australia), enzymes that can function under the extreme conditions of Deep Lake, like those which the microbes’ DNA programs them to make, could serve as catalysts for synthesizing peptides (short proteinlike molecules)[Wikipedia] and for enhanced oil recovery, and can function in mixtures of water and organic solvents—“especially useful for transforming contaminated sites with particularly high levels of petroleum-based products.”
Figure 2. “The towering white cliffs of Dover in England are made of the chalky white shells that envelop the single-celled photosynthetic alga, E. huxleyi. 'Ehux' is a coccolithophore, with an exoskeleton made of calcium carbonate. Even though the process by which the alga’s 'armor' forms releases carbon dioxide, Ehux can trap as much as 20 percent of organic carbon, derived from CO2, in some marine ecosystems.” (2013 Progress Report[SciTech Connect], Joint Genome Institute, pp. 28-29.)
Individual university labs also describe genomics research related to energy production and our environment. “To bioethanol through genomics of microbial synergies”[SciTech Connect], a final report on a Northeastern University project, describes an effort to advance microbial cultivation techniques and address the hypothesis that “uncultivatable” microorganisms and their consortia represent an untapped source of novel species for efficient bioethanol[Wikipedia] production. The researchers designed a new type of diffusion chamber (“isolation chips” or “ichips”) for incubating microorganisms and found the ichips able to grow and isolate environmental microorganisms in soils and water. (See Figure 3.) The ichips were then used at various sites and habitats around Boston where soil samples had indicated microbial abundances between 1 billion and 100 billion cells per gram of soil. The ichips have provided a new means of cultivating microorganisms, but in the course of this project, none of the species grown in the ichips have turned out to produce bioethanol more efficiently than already-known strains do.
Figure 3: Isolation chips (“ichips”) for incubating microorganisms designed to avoid conventional incubators’ potential to interfere with tools for detecting heavy metal precipitation. “The ichip consists of three plates with matching holes, as illustrated … . The middle plate is dipped into a cell suspension and then removed; the through-holes sample the suspension in the form of tiny droplets. Then 0.03-[micrometer] membranes are placed over both sides of this plate, and the remaining two plates with matching holes are placed over the membranes. The ichip is thus a 5-layer sandwich held together by screws, which, once tightened, press the membranes against the central plate sealing the contents of the individual holes. The ichip is then incubated in the natural environment, which provides the cells inside the miniature diffusion chambers with the natural suit of nutrients and growth factors.” (From “To bioethanol through genomics of microbial synergies”[SciTech Connect], p. 2.)
“The Ecology and Genomics of CO2 Fixation in Oceanic River Plumes”[SciTech Connect] from the University of South Florida, notes that the plumes of rivers emptying into oceans are “tremendous sources of CO2 and dissolved inorganic carbon”. The report describes how carbon-dioxide concentrations in river plumes correlate with the way different microorganisms’ carbon-fixation genes express molecular-synthesis instructions[Wikipedia] where the Mississippi and Orinoco Rivers empty into the Atlantic Ocean. In the Mississippi plume, the “high plume”/low salinity region near the shore (with salinity no greater than 32 parts per trillion) had “tremendous” carbon-dioxide drawdown correlated to carbon-fixation gene expression in heterokonts[Wikipedia] (organisms, mostly algae, whose cells have nuclei and whose flagellate cells have two differently-shaped flagella[Wikipedia] for propulsion). Based on studies using radioactive nitrogen-15, the main form of nitrogen involved in this fixation was found to be urea, which was believed to originate from artificial fertilizer contained in the Mississippi River watershed. In the Mississippi “intermediate plume” (salinity 34 parts per trillion), carbon fixation by Synechococcus was high, while in nonplume stations, expression of Prochlorococcuscarbon-fixation genes was positively correlated with dissolved CO2 concentrations. Correlations in the Orinoco plume were somewhat different; among other differences, the high-plume carbon-fixation gene expression of heterokonts was much smaller than that in the Mississippi plume, “most likely” because of less artificial nutrient in the Orinoco watershed.
The genomics of a quite different environmental concern, low-dose ionizing radiation, are discussed in the UC Davis report “Genome Wide Evaluation of Normal Human Tissue in Response to Controlled, In vivo Low-Dose Low LET Ionizing Radiation Exposure: Pathways and Mechanisms”[SciTech Connect]. The “LET” in the title stands for “Linear Energy Transfer”, the transfer of energy from a particle of radiation to the material it travels through.[Wikipedia] “Pathways” refers to sequences of metabolic reactions that link biochemical compounds together.[Wiktionary] Among the findings:
- nine gene groups in skin samples from individuals undergoing radiotherapy were found to have a transient change in how their DNA was transcribed to RNA when examined 3, 8, and 24 hours after the patients’ exposure to low radiation doses;
- in skin exposed to low radiation doses in vivo, several cell functions are affected that involve biochemical pathways that link DNA segments, molecules synthesized according to those segments’ instructions, and other cell substances;
- while both low radiation doses and arsenic oxidatively damage proteins, at least some proteins respond differently to each.
The researchers also examined radiation responses of skin cell lines grown as monolayers and of discarded surgical skin for comparison with in vivo skin response. Findings from these experiments were published in works cited in this report.
The genomes of three microorganism genera are of particular interest to the Department of Energy, as the following reports demonstrate.
The MIT report “Genomic Structure, Metagenomics, Horizontal Gene Transfer, and Natural Diversity of Prochlorococcus and Vibrio”[SciTech Connect] summarizes several investigations to “develop a deep understanding of the design of Prochlorococcus[Wikipedia] and Vibrio[Wikipedia] cells, the variations in their designs, and the constraints that have shaped this variation at the cell-environment interface” from “individual cell design to the dynamics of large populations.” The specific aims of these researches were to (1) identify the patterns of genome diversity within and among natural Prochlorococcus and Vibrio populations, and relate these patterns to the cellular metabolism, population genetics, and the ecology of these groups, (2) characterize the design of the cellular machineries of Prochlorococcus and Vibrio, (3) characterize the role of phage and horizontal gene transfer in shaping genome diversity, population structure, and metabolism in Prochlorococcus, and (4) harness natural genetic diversity and processes for strain engineering—a technologically significant purpose.
In one of the many activities described in this brief report, different populations of Vibrio bacteria were identified by genome. Examining how these bacteria are associated with animal populations showed that while zooplankton[Wikipedia] are colonized by specific Vibrio populations, larger animals are colonized by generalist populations. According to the report authors, rapid colonization of larger animals via food items and rapid turnover of populations within them can explain the greater diversity of their Vibrio colonies.
In studies of the very small Prochlorococcus bacteria (~600 nanometers across, smaller than the minimum thickness of a typical human red blood cell[Wikipedia]) genomes were determined for cells from both the Atlantic and the Pacific, and from different depths and seasons at one single location. New strains were also isolated based on specific nutritional requirements. One discovery from this activity was the existence of nitrate-assimilating strains that were previously thought not to exist. The investigators also examined Prochlorococcus’ manufacture of proteins. As mentioned above, cells assemble proteins based on RNA copies of their DNA data which describes those proteins. The investigators found that for particular proteins, the Prochlorococcus cells’ peak production of the RNA copies occurred 4 to 6 hours before the production of the proteins themselves peaked. This suggests that other important mechanisms besides the RNA production itself regulateProchlorococcus protein manufacture.
A set of genomic studies of Shewanella oneidensis bacteria has been completed by a wider collaboration of scientists in academia, national labs, and private industry.[PNNL] Final reports[SciTech Connect, SciTech Connect, SciTech Connect] from three of the groups, at the University of Oklahoma’s Institute for Environmental Genomics, Michigan State University, and the University of Southern California, focus respectively on Shewanella strains’ energy metabolism, influences on their genomic content, and their ability to exchange electrons with solid-state electron acceptors and donors. As brief as this list of goals is, the reports show that these few items encompass many details of Shewanella genetic expression. One reason for DoE interest in these bacteria comes through in the reports: “Collectively, these findings provide important new information toward identifying the most effective Shewanella strain for cleaning up specific contaminants and metals in a given environment, beginning with the extensively contaminated DOE facilities at several National Laboratories and extending to metal contaminated sites worldwide.”[SciTech Connect]
Figure 4. “We propose that Shewanella use riboflavin as both an electron shuttle and an attractant to direct cell movement toward local sources of insoluble electron acceptors. The cells secrete reduced riboflavin, which diffuses to a nearby particle containing an insoluble electron acceptor and is oxidized. The oxidized riboflavin then diffuses away from the particle, establishing a spatial gradient that draws cells toward the particle. The proposed mechanism was supported by experimental and mathematical modeling results.” (From “Integrated Genome-Based Studies for Shewanella Ecophysiology”[SciTech Connect], Michigan State University, p. 7.)
New techniques for doing something often enable the accomplishment of multiple purposes beyond those that the techniques’ inventors had in mind. One purpose that could be served by determining a genome’s sequence of DNA more quickly and efficiently is recognizing and responding to disease outbreaks in a more timely manner—whether the diseases are natural occurrences or the result of biological warfare or terrorist attacks. These possibilities are discussed in the Los Alamos National Laboratory (LANL) report “State of the Art for Autonomous Detection Systems using Genomic Sequencing”[SciTech Connect] and in the report “Genomics-Enabled Sensor Platform for Rapid Detection of Viruses Related to Disease Outbreak”[SciTech Connect] from Sandia National Laboratories by scientists at Sandia, Colorado State University, and Washington State University.
The LANL report notes that the U.S. Department of Homeland Security’s BioWatch[Wikipedia] program provides for warnings of aerosol attacks with biological threat agents, “an important risk mitigation capability” that nonetheless has “many operationally significant challenges” to address:
Operational experience with the current BioWatch system indicates strong need for improved accuracy while maintaining robust precision. The great diversity of the microbial world coupled with the fact that only a fraction of this diversity has been identified and characterized has resulted in a significant number of “environmental positives” that have adversely impacted BioWatch performance. The fact that no aerosol attacks have been detected or impacts observed provides little evidence for understanding the false negative potential. This is greatly compounded by the inherent uncertainty of the biothreat.
… The uncertainty associated with the biothreat [e.g., which agent(s) will be encountered, at which location(s), when will it occur, how much and how will it be dispersed] provides great operational constraints on BioWatch. A large number of detector units are needed to cover populations at risk and the system should be responsive to all potential biothreat agents (including emerging, reemerging and engineered pathogens) that may be presented as aerosol threats. In addition, indication of unique agent phenotypic[Wikipedia] characteristics is desirable to guide response decisions (e.g., is the specific strain encountered responsive to a particular antibiotic).
The current BioWatch system is focused on a specific set of pathogens and provides some level of identification. To be fully responsive to the potential bioaerosol threat the scope of agents addressed must be greatly increased and the cost of coverage must be drastically decreased. (P. 2)
The report then discusses the advantages of the newest genome-sequencing techniques (Next Generation Sequencing, or NGS) for addressing the challenges:
For many applications, sequencing offers great advantages over the traditional methods. For example, in the field of pathogen detection, NGS cannot only identify known organisms, but also indication of novel, emerging, and engineered ones. ... In addition, NGS does not require prior knowledge of pathogens present in a sample like the traditional detection methods. Therefore, NGS shows promise as the ultimate pathogen detection tool. (P. 6)
Until recently, NGS was a slow and costly process. However, it is becoming cost competitive and sufficiently rapid for many applications. Even though NGS is unlikely to replace the rapid and portable pathogen detection platforms in the next couple of years, in many cases it will provide actionable information faster than the rapid systems. This is mainly due to the comprehensive information provided in an organism’s sequence, versus a few selected segments of the genome. It is the only technology that can perform all of the following tasks in parallel from almost any sample: 1) detect all known pathogens: viruses, bacteria, and protozoa, 2) identify emerging pathogens, whether they have naturally evolved or been engineered, and 3) characterize the pathogens (for example, determine antibiotic resistance or pathogenicity).
Over the next 2-3 years NGS applications will likely help generate a world map displaying the real-time status of all infectious diseases. The data will be provided by a global network of interconnected facilities that use NGS platforms. Sequencing data, combined with the computational models of disease progression and easy visualization, will enable accurate prediction and monitoring of disease spread, and reduce the effects on human lives and local economies.
With the existing or forthcoming hardware and software upgrades, NGS technology will provide actionable information in 8-48 hours (including sample preparation, analysis and interpretation), depending on the platform, the number of samples, and types of information needed. The simplest process includes detection of known pathogens and determination of some of their features, such as antibiotic resistance. More complex processes will involve identification of novel pathogens in mixed samples (clinical or environmental samples such as BioWatch aerosol samples), prediction of their pathogenicity and susceptibility to antibiotics, vaccine efficacy, and matching their identities to pathogens that previously caused serious outbreaks. (Pp. 7-8)
The report also provides a table describing four commercially available genome-sequencing devices.
The Sandia report describes a project to facilitate the diagnosis of viruses carried by arthropods[Wikipedia, Wikipedia], which “[i]n recent decades … have emerged as some of the most significant threats to human health”, by developing an electrochemical assay and a fluorescent-based assay for detecting RNA from a wide range of arthropod-borne viruses, and developing prototype microfluidic diagnostic platforms that use these assays. The investigators “generated and characterized suitable primers for West Nile Virus RNA detection. Both optical and electrochemical transduction technologies were developed for DNA-RNA hybridization detection and were implemented in microfluidic diagnostic sensing platforms that were developed in this project.”
While genomic data could be used to recognize an outbreak of disease, a new patent assigned to the Regents of the University of California describes a use of such data to combat its sources. As the patent (“Uses of Antimicrobial Genes from Microbial Genome”[DOepatents]) notes in its “Background of the Invention” section, “Microbes … frequently produce and secrete compounds aimed at killing other microbes which help them in their continuous struggle for survival in their ecological niche. … Proteins that target bacteria have a broad medical and biotechnological application spectrum. They can be used as direct antibiotics for human and veterinary medicine …, as growth enhancers in livestock …, as food preservatives ... as killers of phytopathogenic bacteria for crop management …, etc.”
When it comes to identifying the genes that carry the pattern for making these toxins, though, one of the common methods of studying a gene’s function presents a problem. The method, “to clone [the gene] into a model bacteria species (with Escherichia coli (E. coli) being the most popularly used model) and to study the expressed product”, takes advantage of the relatively direct connection between a gene and what it programs a cell to make: to find out what a gene’s function is in the organism it came from, insert the gene into some bacteria and see how the bacterias’ products change once the gene is inserted. But for genes that produce antibacterial toxins, this method won’t necessarily work in the usual way since “gene products that are toxic to bacteria will usually be uncloneable in E. coli due to their negative effect on the bacterial growth.”
Actually, as the patent shows, the method does work, if you consider that the product of such a gene is the E. coli’s response of not cloning it. Find which inserted genes don’t get cloned, or get cloned a lot less than other genes, and you’ve found what might be a gene that contains the pattern for making a toxin.
Thus “the present invention provides a method for identifying regions from microbial genomes that are uncloneable into E. coli, retrieve antimicrobial genes that reside in these regions, and demonstrate their toxicity to E. coli and other pathogenic microbes.” “Also described,” according to the patent’s abstract, “are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.”
Current methods of genome engineering typically introduce one DNA construct per cell, generally at a low efficiency (around 0.1%). Sometimes a large collection of constructs is introduced into a large number of cells simultaneously, in a single tube, to produce a clone ‘library’, but the intention is still typically to have one DNA type per cell. To eliminate the many surviving unwanted cells lacking any new DNA, typically a selection and/or screen is performed at each step of a multi-step construction. It is rare to complete a genome engineering construct with more than a dozen steps. (“Background” section of “Multiplex automated genome engineering”[DOepatents].)
To solve this problem,
The present invention is based on the discovery of a method to introduce multiple nucleic acid sequences into one or more cells such that the entire cell culture approaches a state involving a large set of changes to each genome or region. This novel method can be used to generate one specific configuration of alleles [alternate forms of particular genes[Wikipedia]] or can be used to combinatorial[Wiktionary] exploration of designed alleles optionally including random, i.e., not-designed, changes. This novel method can be used with any of a variety of devices that allow the cyclic addition of many DNAs in parallel in random or specific order, with or without use of one or more selectable markers.
The patent goes on to describe the method in detail of introducing multiple DNA sequences into a cell.
Two other recent reports from Los Alamos National Laboratory deal with investigations of structural genomics, which aims at determining the 3-dimensional structure of every protein that a given genome describes[Wikipedia]. Indeed, one of these reports outlines a program for the field that states the goal as “know the structure of all proteins”.
This report (“Achievements and future of structural genomics”[SciTech Connect]), a slide presentation for a September 2013 conference, describes a threefold strategy to determining all proteins’ structures:
- Solve structures now so that they will be there when we need them
- Solve enough structures to infer structures of most proteins
- Solve groups of structures that give a deep biological or biophysical understanding
The last point indicates that the aim is not to simply have data about protein shapes, but insight into how organisms function.
Figure 5: “Semi-automated Approaches Allow Determination of Large Structures”. (From the slide presentation “Achievements and future of structural genomics”[SciTech Connect], Los Alamos National Laboratory, p. 14 of 56.)
Much of the rest of the presentation deals with systematically targeting proteins for structure determination, making those structures available for everyone to use, cooperating internationally, and providing technologies that can be used by all biologists. One example of a new technology is described in the other Los Alamos report, “Application of Dye-Ligand Affinity Chromatography to Structural Genomics”[SciTech Connect], an abstract of an invited talk at the Korea Research Institute of Bioscience and Biotechnology. The new technology it describes is a high-throughput technique using Cibacron Blue dye to identify proteins from cell extracts that interact with ligands (substances that bind to biomolecules)[Wikipedia]. About 50% of the proteins in the extract bind to the dye; further processing steps identify those proteins that interact with specific ligands. The capture of proteins targeted by drugs and the determination of protein structures from examination of their crystalline forms both been facilitated by use of the new technique.
The “future” mentioned in “Achievements and future of structural genomics” is dealt with at length in the last section, which contains a list of fundamental questions for the future (Figure X) along with ideas about what their answers will mean. Knowledge of the genome itself, combined with a knowledge of the structures specified by the genome and what functions these structures perform, is expected to lead beyond information about individual protein molecules to a detailed understanding of metabolic pathways, cells, and entire organisms. Like other biological knowledge already gained over the centuries, such an understanding of organisms should, beyond its intrinsic interest or its use for energy production and environmental protection, have medical value as well, in this case by enabling people to design proteins and develop therapeutics.
Reports Available through OSTI’s SciTech Connect
Reports Available through OSTI’s DOepatents
One of the Joint Genome Institute’s mission areas, carbon sequestration, is the focus of much research sponsored by the U.S. Department of Energy, as was indicated in the April 2014 Science Showcase. As noted on page 40 of the Institute’s 2013 Progress Report, the Institute “used several million CPU hours on NERSC’s first petascale supercomputer, Hopper” that year for calculations that could not have been completed on its terabyte computing cluster, Genepool. High-performance computing at the petascale (involving quadrillions of operations per second and/or quadrillions of bytes of data) was the topic of the January 2014 Science Showcase.