Atmakuru, A. (2025) Enhancing the performance and transparency of Machine Learning (ML) using MRI-derived data: alternative approaches to ML interpretability-explainability. PhD thesis, University of Reading. doi: 10.48683/1926.00127413
Abstract/Summary
This thesis addresses the three fundamental challenges for enhancing the performance of Machine Learning (ML) models. Despite their evolving predictive capabilities, MLsstill present significant limitations in generalisability, particularly in high-dimensional settings, interpretability, and high data requirements. These issues require methodologies that reduce input data dimensionality, enhance transparency, and utilise prior knowledge to moderate the scale of data requirements, thereby improving the performance, reliability, and efficiency of machine learning solutions in practical applications. Accordingly, this thesis introduces three independent methods responsive to the above main limitations that need to be overcome to enhance the performance and transparency of models in complex task domains. First, two filter-based feature selection techniques—one correlation-driven and the other clustering-based—are developed to reduce redundancy and enhance generalisability in high-dimensional data. The correlation-based technique outperforms the state-of-the-art (as represented by ReliefF) in both internal and external validations. Second, an ensemble explainability framework integrates Shapley Additive Explanations (SHAP) values with Sobol indices, combining their rankings to yield stable and interpretable attributions. Third, a multi-stage algorithm couples transfer learning with an autoencoder to minimise labelled data requirements without adversely affecting performance. All proposed methods yielded quantifiable improvements. The feature selection techniques reduced input dimensionality while enhancing accuracy and generalisability compared to ReliefF. The ensemble explainability framework produced consistent attributions under varying data distributions and reliably identified informative input features. The multi-stage algorithm achieved enhanced classification performance with reduced reliance on labelled data. Case-Study: The proposed methods were validated in the context of medical diagnosis for early-stage prediction of dementia, utilising a structural Alzheimer’s MRI dataset. In this application, optimising the feature selection, as described above, enhanced the cross-cohort accuracy and decreased the data dimensionality. The explainability framework consistently identified clinically relevant regions, such as hippocampal subfields (W. Zhao et al., 2019) and the temporal horn (Vernooij and van Buchem, 2020), supporting the credibility of feature relevance. The data-efficient multi-stage pipeline achieved an accuracy of 73.26%, exceeding prior baselines (Li et al., 2015; Oh et al., 2019). This thesis concludes that the proposed correlation and clustering-based feature selection, ensemble explainability combining SHAP and Sobol, and transfer learning with autoencoders have led to enhanced accuracy, robustness, and transparency of the performance of the machine learning models. Although this was validated for the Alzheimer’s validation task, these methods are domain-agnostic and provide scalable, reliable, and resource-efficient approaches for high-dimensional, data-limited real-world applications.
Altmetric Badge
| Item Type | Thesis (PhD) |
| URI | https://centaur.reading.ac.uk/id/eprint/127413 |
| Identification Number/DOI | 10.48683/1926.00127413 |
| Divisions | Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science |
| Download/View statistics | View download statistics for this item |
Downloads
Downloads per month over past year
University Staff: Request a correction | Centaur Editors: Update this record
Download
Download