Improvement of AlphaFold2-based methods for modelling quaternary structures of proteinsGenc, A. G. (2025) Improvement of AlphaFold2-based methods for modelling quaternary structures of proteins. PhD thesis, University of Reading
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.48683/1926.00120612 Abstract/SummaryThe functions of proteins are determined by their 3D structures, hence different methods have been developed in order to predict protein structures as a stepping stone to better understanding their functions and interactions. Protein structure modelling was a process that often involved two different stages: modelling and refinement. However, the release of the deep neural network-based AlphaFold2 (AF2) as a protein modelling tool in 2020 has enabled significant advances in protein bioinformatics. These advances have made it possible to predict models of monomeric structures that are close to structures derived experimentally. Thus, the effective application of machine learning approaches has reduced the need for the traditional refinement process. Instead, modellers use end-to-end processes for improvements covering both modelling and refinement. One of the most important developments in for such processes was the open-access release of the AF2 code. As a result, almost all recent tools have integrated the AF2 code into their own pipelines using different methods and parameters, aiming to obtain better models than those produced by the default AF2 method. However, the successes achieved for monomeric globular structures have not yet been realised for multimeric globular structures, and this has increased the need for the development of new modelling tools. Although many AF2 versions have been introduced in the process, the full effectiveness of AF2 - and indirectly, which structures it can accurately predict - is not yet fully understood. Therefore, the basis of this research is to investigate the features of this black box and to explore how to use it most effectively for the improvement of quaternary structure models. In this direction, we aimed to design an improved AF2-based multimeric protein modelling pipeline. The effect of recycling, a key part of the AF2 algorithm, on the refinement of models is investigated in Chapter 2. The results show that in both AF2 versions (AF2_Advanced and AF2_Multimer (AF2M)) the quality of the predicted protein model improves as the number of recycles increases. It is also shown that while 3 cycles is the default value for the AF2 versions, 12 cycles may be more effective for both main versions. With the integration of the custom template option into the AF2M code, the effect of custom templates and recycling methods on protein modelling are examined in Chapter 3. It is shown that providing initial structural information to AF2M as an input and further recycling can lead to better quality structure models. It is also emphasised that using multiple sequence alignment (MSA) inputs is more effective in AF2M compared to providing a single sequence (SS). Another new parameter introduced for AF2M for improving modelling was the custom MSA option. Although the effectiveness of custom templates and custom MSA options have been supported by many studies, the effect of altering these input features on AF2M rather than using the defaults had not been fully revealed. In Chapter 4, we discovered that when multimeric custom template structures are given to AF2M as a “single-chain” protein structure, a cumulative improvement in TM-scores and IDDT scores are observed, although there is no improvement in interface scores (QS-scores and DockQ_wave scores). Furthermore, in order to obtain custom MSAs, disordered residues in homologous sequences were deleted within MSAs, so that AF2M made its predictions only from residues corresponding to ordered regions. As a result, it was also observed that AF2M obtained higher quality protein structures in more than half of the targets. These two major results emphasise that input changes to AF2M can be more effective for target-specific protein modelling than for general protein modelling. Finally, based on the results from the previous chapters, we designed two successive versions of a protein modelling tool called MultiFOLD, which aims to create a pool of models with conformational sampling using custom template recycling followed by ranking and selection. In Chapter 5, through extensive analysis of benchmarking data we demonstrate that MultiFOLD is particularly effective in modelling multimeric globular structures, and the latest version, MultiFOLD2, outperformed all other servers including AlphaFold3 (AF3) that are participating in the CAMEO-BETA project. With the acquisition of better-quality protein structures, it is now possible to better infer function and to model protein-ligand interactions in downstream analyses.
Download Statistics DownloadsDownloads per month over past year Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |