skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Using support vector machines to improve elemental ion identification in macromolecular crystal structures

Abstract

A method to automatically identify possible elemental ions in X-ray crystal structures has been extended to use support vector machine (SVM) classifiers trained on selected structures in the PDB, with significantly improved sensitivity over manually encoded heuristics. In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here, the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfullymore » classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.« less

Authors:
 [1];  [2];  [3];  [3];  [2]
  1. University of California, Berkeley, CA 94720 (United States)
  2. (United States)
  3. Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States)
Publication Date:
OSTI Identifier:
22351152
Resource Type:
Journal Article
Resource Relation:
Journal Name: Acta Crystallographica. Section D: Biological Crystallography; Journal Volume: 71; Journal Issue: Pt 5; Other Information: PMCID: PMC4427199; PMID: 25945580; PUBLISHER-ID: tz5065; OAI: oai:pubmedcentral.nih.gov:4427199; Copyright (c) Morshed et al. 2015; This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.; Country of input: International Atomic Energy Agency (IAEA)
Country of Publication:
Denmark
Language:
English
Subject:
75 CONDENSED MATTER PHYSICS, SUPERCONDUCTIVITY AND SUPERFLUIDITY; ACCURACY; ATOMS; CALCIUM; CRYSTAL STRUCTURE; DENSITY; ELECTRON DENSITY; ENVIRONMENT; ERRORS; IRON; MOLECULES; NICKEL; SCATTERING; VALIDATION; ZINC

Citation Formats

Morshed, Nader, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, Echols, Nathaniel, E-mail: nechols@lbl.gov, Adams, Paul D., E-mail: nechols@lbl.gov, and University of California, Berkeley, CA 94720. Using support vector machines to improve elemental ion identification in macromolecular crystal structures. Denmark: N. p., 2015. Web. doi:10.1107/S1399004715004241.
Morshed, Nader, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, Echols, Nathaniel, E-mail: nechols@lbl.gov, Adams, Paul D., E-mail: nechols@lbl.gov, & University of California, Berkeley, CA 94720. Using support vector machines to improve elemental ion identification in macromolecular crystal structures. Denmark. doi:10.1107/S1399004715004241.
Morshed, Nader, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, Echols, Nathaniel, E-mail: nechols@lbl.gov, Adams, Paul D., E-mail: nechols@lbl.gov, and University of California, Berkeley, CA 94720. Fri . "Using support vector machines to improve elemental ion identification in macromolecular crystal structures". Denmark. doi:10.1107/S1399004715004241.
@article{osti_22351152,
title = {Using support vector machines to improve elemental ion identification in macromolecular crystal structures},
author = {Morshed, Nader and Lawrence Berkeley National Laboratory, Berkeley, CA 94720 and Echols, Nathaniel, E-mail: nechols@lbl.gov and Adams, Paul D., E-mail: nechols@lbl.gov and University of California, Berkeley, CA 94720},
abstractNote = {A method to automatically identify possible elemental ions in X-ray crystal structures has been extended to use support vector machine (SVM) classifiers trained on selected structures in the PDB, with significantly improved sensitivity over manually encoded heuristics. In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here, the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.},
doi = {10.1107/S1399004715004241},
journal = {Acta Crystallographica. Section D: Biological Crystallography},
number = Pt 5,
volume = 71,
place = {Denmark},
year = {Fri May 01 00:00:00 EDT 2015},
month = {Fri May 01 00:00:00 EDT 2015}
}
  • In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here, the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalousmore » diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.« less
  • The solvent-picking procedure in phenix.refine has been extended and combined with Phaser anomalous substructure completion and analysis of coordination geometry to identify and place elemental ions. Many macromolecular model-building and refinement programs can automatically place solvent atoms in electron density at moderate-to-high resolution. This process frequently builds water molecules in place of elemental ions, the identification of which must be performed manually. The solvent-picking algorithms in phenix.refine have been extended to build common ions based on an analysis of the chemical environment as well as physical properties such as occupancy, B factor and anomalous scattering. The method is most effectivemore » for heavier elements such as calcium and zinc, for which a majority of sites can be placed with few false positives in a diverse test set of structures. At atomic resolution, it is observed that it can also be possible to identify tightly bound sodium and magnesium ions. A number of challenges that contribute to the difficulty of completely automating the process of structure completion are discussed.« less
  • We apply Support Vector Machines (SVMs)-a machine learning algorithm-to the task of classifying structures in the interstellar medium (ISM). As a case study, we present a position-position-velocity (PPV) data cube of {sup 12}CO J = 3-2 emission toward G16.05-0.57, a supernova remnant that lies behind the M17 molecular cloud. Despite the fact that these two objects partially overlap in PPV space, the two structures can easily be distinguished by eye based on their distinct morphologies. The SVM algorithm is able to infer these morphological distinctions, and associate individual pixels with each object at >90% accuracy. This case study suggests thatmore » similar techniques may be applicable to classifying other structures in the ISM-a task that has thus far proven difficult to automate.« less
  • One approach to validate nuclear power plant (NPP) signals makes use of pattern recognition techniques. This approach often assumes that there is a set of signal prototypes that are continuously compared with the actual sensor signals. These signal prototypes are often computed based on empirical models with little or no knowledge about physical processes. A common problem of all data-based models is their limited ability to make predictions on the basis of available training data. Another problem is related to suboptimal training algorithms. Both of these potential shortcomings with conventional approaches to signal validation and sensor operability validation are successfullymore » resolved by adopting a recently proposed learning paradigm called the support vector machine (SVM). The work presented here is a novel application of SVM for data-based modeling of system state variables in an NPP, integrated with a nonlinear, nonparametric technique called the multivariate state estimation technique (MSET), an algorithm developed at Argonne National Laboratory for a wide range of nuclear plant applications.« less
  • Hyperspectral images consist of large number of bands which require sophisticated analysis to extract. One approach to reduce computational cost, information representation, and accelerate knowledge discovery is to eliminate bands that do not add value to the classification and analysis method which is being applied. In particular, algorithms that perform band elimination should be designed to take advantage of the structure of the classification method used. This letter introduces an embedded-feature-selection (EFS) algorithm that is tailored to operate with support vector machines (SVMs) to perform band selection and classification simultaneously. We have successfully applied this algorithm to determine a reasonablemore » subset of bands without any user-defined stopping criteria on some sample AVIRIS images; a problem occurs in benchmarking recursive-feature-elimination methods for the SVMs.« less