Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Correlated mutations in protein sequences: Phylogenetic and structural effects

Technical Report ·
DOI:https://doi.org/10.2172/296863· OSTI ID:296863
 [1];  [2];  [1];  [3]
  1. Los Alamos National Lab., NM (United States). Theoretical Div.
  2. C.E.N. Saclay, Gif/Yvette (France). Service Physique Theorique
  3. Univ. of Colorado, Boulder, CO (United States). Dept. of Molecular, Cellular and Developmental Biology
Covariation analysis of sets of aligned sequences for RNA molecules is relatively successful in elucidating RNA secondary structure, as well as some aspects of tertiary structure. Covariation analysis of sets of aligned sequences for protein molecules is successful in certain instances in elucidating certain structural and functional links, but in general, pairs of sites displaying highly covarying mutations in protein sequences do not necessarily correspond to sites that are spatially close in the protein structure. In this paper the authors identify two reasons why naive use of covariation analysis for protein sequences fails to reliably indicate sequence positions that are spatially proximate. The first reason involves the bias introduced in calculation of covariation measures due to the fact that biological sequences are generally related by a non-trivial phylogenetic tree. The authors present a null-model approach to solve this problem. The second reason involves linked chains of covariation which can result in pairs of sites displaying significant covariation even though they are not spatially proximate. They present a maximum entropy solution to this classic problem of causation versus correlation. The methodologies are validated in simulation.
Research Organization:
Los Alamos National Lab., Theoretical Div., NM (United States)
Sponsoring Organization:
USDOE Office of Energy Research, Washington, DC (United States)
DOE Contract Number:
W-7405-ENG-36
OSTI ID:
296863
Report Number(s):
LA-UR--98-1091; CONF-9707181--; ON: DE99000640
Country of Publication:
United States
Language:
English

Similar Records

A Maximum Entropy Formalism for Disentangling Chains of Correlated Sequence Positions
Technical Report · Mon Aug 03 00:00:00 EDT 1998 · OSTI ID:763147

Identifying RNA contacts from SHAPE-MaP by partial correlation analysis
Journal Article · Mon Jun 24 00:00:00 EDT 2019 · arXiv.org Repository · OSTI ID:1633236

Comparative analysis of ribonuclease P RNA using gene sequences from natural microbial populations reveals tertiary structural elements
Journal Article · Mon Apr 01 23:00:00 EST 1996 · Proceedings of the National Academy of Sciences of the United States of America · OSTI ID:258596