skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach

Journal Article · · Scientific Data
 [1];  [1];  [1];  [1];  [2]
  1. Rutgers Univ., Piscataway, NJ (United States)
  2. Rutgers Univ., Piscataway, NJ (United States); Univ. of California San Diego, La Jolla, CA (United States)

Outlier analyses are central to scientific data assessments. Conventional outlier identification methods do not work effectively for Protein Data Bank (PDB) data, which are characterized by heavy skewness and the presence of bounds and/or long tails. We have developed a data-driven nonparametric method to identify outliers in PDB data based on kernel probability density estimation. Unlike conventional outlier analyses based on location and scale, Probability Density Ranking can be used for robust assessments of distance from other observations. Analyzing PDB data from the vantage points of probability and frequency enables proper outlier identification, which is important for quality control during deposition-validation-biocuration of new three-dimensional structure data. Ranking of Probability Density also permits use of Most Probable Range as a robust measure of data dispersion that is more compact than Interquartile Range. The Probability-Density-Ranking approach can be employed to analyze outliers and data-spread on any large data set with continuous distribution.

Research Organization:
Rutgers Univ., Piscataway, NJ (United States)
Sponsoring Organization:
USDOE; National Science Foundation (NSF); National Institute of General Medical Sciences (NIGMS); National Cancer Institute (NCI)
Grant/Contract Number:
NSF-DBI 1338415
OSTI ID:
1624556
Journal Information:
Scientific Data, Vol. 5, Issue 1; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (51)

Multivariate Analyses of Quality Metrics for Crystal Structures in the PDB Archive journal March 2017
Hibernating Bears, Antibiotics, and the Evolving Ribosome (Nobel Lecture) journal June 2010
PDBe: improved accessibility of macromolecular structure data from PDB and EMDB journal October 2015
Announcing the worldwide Protein Data Bank journal December 2003
Serial femtosecond crystallography: A revolution in structural biology journal July 2016
Crystal and NMR Structures of a Peptidomimetic β-Turn That Provides Facile Synthesis of 13-Membered Cyclic Tetrapeptides journal November 2017
A New Generation of Crystallographic Validation Tools for the Protein Data Bank journal October 2011
RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education: RCSB Protein Data Bank journal November 2017
Potassium Channels and the Atomic Basis of Selective Ion Conduction (Nobel Lecture) journal August 2004
OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive journal March 2017
Retrieval of Crystallographically-Derived Molecular Geometry Information journal November 2004
From the Structure and Function of the Ribosome to New Antibiotics (Nobel Lecture) journal June 2010
X-ray Crystallographic Studies of Proteins journal October 1976
Unraveling the Structure of the Ribosome (Nobel Lecture) journal June 2010
Stereochemistry of polypeptide chain configurations journal July 1963
Accurate bond and angle parameters for X-ray protein structure refinement journal July 1991
Structures and organization of adenovirus cement proteins provide insights into the role of capsid maturation in virus entry and infection journal July 2014
Analysis of solvent content and oligomeric states in protein crystals-does symmetry matter? journal April 2008
Detect, correct, retract: How to manage incorrect structural models journal November 2017
Mutagenic conformation of 8-oxo-7,8-dihydro-2′-dGTP in the confines of a DNA polymerase active site journal June 2010
On the Development of Electron Cryo-Microscopy (Nobel Lecture) journal July 2018
NMR Studies of Structure and Function of Biological Macromolecules (Nobel Lecture) journal July 2003
Biased and Unbiased Cross-Validation in Density Estimation journal December 1987
Mise of kernel estimates of a density and its derivatives journal March 1987
The Molecular Basis of Eukaryotic Transcription (Nobel Lecture) journal September 2007
Identification of the Proton Channel to the Active Site Type 2 Cu Center of Nitrite Reductase: Structural and Enzymatic Properties of the His254Phe and Asn90Ser Mutants , journal December 2008
Structure validation by Cα geometry: ϕ,ψ and Cβ deviation journal January 2003
Solvent content of protein crystals journal April 1968
The future of biocuration journal September 2008
The FAIR Guiding Principles for scientific data management and stewardship journal March 2016
BioMagResBank journal December 2007
Validation of Structures in the Protein Data Bank journal December 2017
Cryocrystallography of biological macromolecules: a generally applicable method journal February 1988
Control of the negative IRES trans -acting factor KHSRP by ubiquitination journal November 2016
Outlier Analyses of the Protein Data Bank Archive Using a Probability-Density-Ranking Approach collection January 2018
Serial femtosecond crystallography of soluble proteins in lipidic cubic phase journal August 2015
Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures journal October 2016
Phi/Psi-chology: Ramachandran revisited journal December 1996
The Protein Data Bank journal January 2000
Free R value: a novel statistical quantity for assessing the accuracy of crystal structures journal January 1992
Single-Particle Reconstruction of Biological Molecules-Story in a Sample (Nobel Lecture) journal July 2018
Matthews coefficient probabilities: Improved estimates for unit cell contents of proteins, DNA, and protein-nucleic acid complex crystals journal September 2003
From Electron Crystallography to Single Particle CryoEM (Nobel Lecture) journal July 2018
Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data journal January 2018
Identification of Outliers book January 1980
Improved amino acid flexibility parameters journal May 2003
ChemInform Abstract: Polar Bears, Antibiotics, and the Evolving Ribosome (Nobel Lecture) journal September 2010
A note on modified cross-validation in density estimation journal March 1992
Modified cross-validation in density estimation journal March 1992
Multivariate analyses of quality metrics for crystal structures in the PDB archive journal May 2017
RCSB Protein Data Bank: sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education journal July 2018


Figures / Tables (5)