Accessibility navigation


Improvement of MD-based protocols for the refinement of 3D protein models

Adiyaman, R. (2021) Improvement of MD-based protocols for the refinement of 3D protein models. PhD thesis, University of Reading

[img] Text (Redacted) - Thesis
· Restricted to Repository staff only until 29 March 2023.

10MB
[img] Text - Thesis
· Restricted to Repository staff only

13MB
[img] Text - Thesis Deposit Form
· Restricted to Repository staff only

407kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.48683/1926.00104643

Abstract/Summary

Proteins are vital constituents of living cells with diverse structural and functional roles. Therefore, the study of functions and structures of proteins is key to our full understanding of living systems at the molecular level. X-ray crystallography and Nuclear Magnetic Resonance (NMR) are the main experimental techniques used to determine protein structures. However, such procedures are costly, labour intensive and time consuming, and many proteins are problematic to solve experimentally. In silico modelling of protein structures provides a potential solution to bridge the huge protein sequence-structure knowledge gap, which widening due to relative efficiency of cheap Next Generation Sequencing compared with experimental methods for determining structures. Nevertheless, the accuracy of predicted 3D structures may not always be adequate for further biological studies compared to experimental data. The refinement of the 3D protein models refers to process used for the improvement of predicted structures, by moving them closer towards experimental quality. Since the 10th Critical Assessment Structure Prediction (CASP10), the usage of Molecular Dynamics (MD)-based refinement protocols has been found to be more effective compared with other protocols. However, the most successful MD-based protocols generally require supercomputer scale resources in order to refine a single 3D protein model. The ReFOLD server was developed by our group to rapidly refine 3D models with more modest computational resources. However, in CASP12 it was found that many of the 3D models from ReFOLD still contained structural flaws and some had drifted further away from the native structure during the refinement process. Many restraint strategies have been used to prevent 3D models from the undesired deviations caused by force field inaccuracies. Here, we propose to use to prior predictive data to provide reliable guidance to the original MD-based protocol of ReFOLD, in order to direct the refinement of models towards the native basin. In the first part of this study, the predicted local model quality scores produced by the ModFOLD server were utilised to guide the original MD-based protocol of ReFOLD. A fixed threshold based on the predicted per-residue error was applied to determine the poorly predicted regions in a 3D model, which could be targeted for refinement. The local quality assessment guided restraint strategy was successful in improving a higher number of 3D models, outperforming the original MD-based protocol according to observed scores. The local quality assessment guided MD-based protocol was also used to refine CASP13 targets, for the refinement and regular prediction categories, and it ranked among the top 10 approaches according to the official independent assessment. Following the CASP13 experiment, the application of a fixed threshold based on the per-residue accuracy score was found to be less applicable for the multi-domain structures. Therefore, we proposed a novel gradual restraint strategy by considering the need of refinement for each residue according to the per-residue accuracy score. The gradual restraint strategy led to further increases in the population of the improved models compared to the fixed restraint strategy. We also applied our gradual restraint strategy for the refinement of the SARS-CoV-2 targets as a part of the CASP Commons COVID-19 initiative, in order to increase the accuracy of best-predicted 3D models, which were identified by our ModFOLD server. A significant number of the estimated top 10 models for each of the targets were generated by our group, according to the initial CASP Commons evaluation. Residue-residue contact prediction methods have now reached up to 70% accuracy and the methods have proved to be useful in protein folding, model quality estimation and drug design. In this study, we describe the first attempt at applying contact predictions for refinement, where we use our Contact Distance Agreement (CDA) scores to apply gradual restraints to guide the MD protocol. The contact-assisted restraint strategy performed well, increasing the population of the improved models in comparison with the gradual restraint strategy based on the local quality estimation. Finally, a binding site focused MD-based refinement protocol was also developed to improve the quality of the protein-ligand binding sites predicted by our FunFOLD server. This focused refinement protocol was successful at increasing the accuracy of all predicted binding sites that were tested as well as improving global model quality of some models. This thesis has focused on exploiting prior predictive data for use in refinement pipelines, to direct the generation of 3D models closer towards the native basin. In the near future, each of the improvements described here will be integrated with new versions of our prediction servers, which will then be made freely accessible for use by general biologists.

Item Type:Thesis (PhD)
Thesis Supervisor:McGuffin, L.
Thesis/Report Department:School of Biological Sciences
Identification Number/DOI:https://doi.org/10.48683/1926.00104643
Divisions:Life Sciences > School of Biological Sciences
ID Code:104643

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation