Better models by discarding data?
- Oregon State University, Corvallis, OR 97331 (United States)
Making the most of hard-won data in protein crystallography: to keep or not to keep, that is the question. In macromolecular X-ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC{sub 1/2}, that can be used for this purpose were characterized and it was shown that CC{sub 1/2} has superior properties compared with ‘merging’ R values. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC{sub 1/2} and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled ‘paired-refinement’ tests show that even though discarding the weaker data leads to improvements in the merging R values, the refined models based on these data are of lower quality. These results show the folly of such data-filtering practices aimed at improving the merging R values. Interestingly, in all of these tests CC{sub 1/2} is the one data-quality indicator for which the behaviour accurately reflects which of the alternative data-handling strategies results in the best-quality refined model. Its properties in the presence of systematic error are documented and discussed.
- OSTI ID:
- 22347848
- Journal Information:
- Acta Crystallographica. Section D: Biological Crystallography, Vol. 69, Issue Pt 7; Other Information: PMCID: PMC3689524; PMID: 23793147; PUBLISHER-ID: ba5192; OAI: oai:pubmedcentral.nih.gov:3689524; Copyright (c) Diederichs & Karplus 2013; This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.; Country of input: International Atomic Energy Agency (IAEA); ISSN 0907-4449
- Country of Publication:
- Denmark
- Language:
- English
Similar Records
Weak data do not make a free lunch, only a cheap meal
Weak data do not make a free lunch, only a cheap meal