Accessibility navigation


Improvements to methods for the quality assessment of three-dimensional models of proteins

Maghrabi, A. H. A. (2020) Improvements to methods for the quality assessment of three-dimensional models of proteins. PhD thesis, University of Reading

[img]
Preview
Text (Redacted) - Thesis
· Please see our End User Agreement before downloading.

8MB
[img] Text - Thesis
· Restricted to Repository staff only

9MB
[img] Text - Thesis Deposit Form
· Restricted to Repository staff only

127kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.48683/1926.00115841

Abstract/Summary

After water, proteins are the most abundant substances in the human body, forming around 80% of its dry mass. Understanding protein function is beneficial for life needs, such as finding medicines, producing healthy foods and combating infectious diseases. Each protein molecule has its own unique sequence which is comprised of linear chains of amino acids. These amino acid chains fold to form tertiary structures, which confer the protein’s function. It is important that we can characterise protein structures in order to better understand their functions. Several experimental methods such as X-ray Crystallography and Nuclear Magnetic Resonance have been applied to solve protein structures. However, such methods are costly and time consuming, and some proteins are also problematic or impossible to solve using these methods. Consequently, the process of growing protein structure data is relatively slow in comparison to the speed of sequencing genomes and their encoded proteins, which has kept increasing especially after breakthroughs in the genetic sequencing technology. As a result, a gap has grown between known protein sequences and their resolved structures and it has been necessary to find other solutions. Computational methods for predicting the structures of proteins directly from own sequences have become fast and effective alternatives to experimental methods. Over the past 20 years there has been an emergence of different types of protein structure predicting methods, the most accurate type being the comparative modelling method, which consists of a number of steps including: template recognition, alignment, quality assessment, and ending with refinement. Each of these steps contribute to successful modelling pipelines, but perhaps the most critical step for the wider acceptance of 3D models of proteins has been the quality assessment step, where the predicted models are evaluated in terms of their likely accuracy, prior to the availability of an experimental structure. Numerous approaches to the quality estimation problem have been developed over the years including the use of statistical potentials, stereochemistry checks and machine learning techniques. Such methods have traditionally been referred to as Model Quality Assessment Programs (MQAPs). One of the leading MQAPs has been the ModFOLD method which has been developed by our group. Since its inception, ModFOLD has been continuously improved, going through many upgrades until its latest version, ModFOLD7. This study was conducted during a major development cycle, beginning with the benchmarking of ModFOLD6, the most powerful MQAP method compared to its other competitors at that time. The study starts with the investigation of the integration of ten MQAP scoring methods in an attempt to enhance performance. The study also explores the implementation of deep neural networks on the MQAP method’s pipeline, and how this technique can be used to improve the MQAP scoring approach. In the later stage of our research, we managed to improve our method significantly leading to the latest upgrade, ModFOLD7. During this project, we also participated in a number of independent blind experiments and competitions to verify our improvements. We also undertook several collaborations in order to apply our methods in practical contexts. The overall results have shown incremental but significant improvements in ModFOLD performance during this study, with an approximate 5% improvement over previous versions. However, there are still plenty of room for ModFOLD to improve further and a number of suggestions for further developments will be addressed throughout this thesis.

Item Type:Thesis (PhD)
Thesis Supervisor:McGuffin, L.
Thesis/Report Department:School of Biological Sciences
Identification Number/DOI:https://doi.org/10.48683/1926.00115841
Divisions:Life Sciences > School of Biological Sciences
ID Code:115841
Date on Title Page:September 2019

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation