Sensitivity analysis of convolutional neural networks: a ranking of hyper-parameter influence and its practical applicationTaylor, R. (2023) Sensitivity analysis of convolutional neural networks: a ranking of hyper-parameter influence and its practical application. PhD thesis, University of Reading
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.48683/1926.00118663 Abstract/SummaryThe aim of this work is to better understand and quantify the influence of training hyper-parameters on Convolutional Neural Networks (CNN) test accuracy using Sensitivity Analysis (SA). The results of the SA will produce a general ranking of influence that will be able to inform the reduction of the parameter search space during Hyper-Parameter Optimisation (HPO) in an effort to increase the efficiency of the process without compromising model performance. Additionally, a novel metric, Accuracy Gain, was developed to better estimate tuning efficiency and facilitate the comparison of parameter group performance. The methodology of this research can be summarised in three parts. Firstly, the creation of a framework for SA of Deep Learning (DL) models, SADL, which perform two state of the art SA methods, Sobol Indices and Morris Method, on CNN hyper-parameters. The resulting sensitivity measures indicating hyper-parameter influence produce a ranking that informs which parameters should be targeted during HPO. Bayesian Optimisation was performed for parameter groups of various influence, and the accuracy gain metric calculated for each to quantify tuning efficiency. Finally, these results were applied to a real world scenario in a case study on the classification of colo-rectal cancer images. They key findings of this work were the development of a robust framework of SA applied to DL and that it is possible to provide empirically based guidance on which parameters to optimise. The SA highlighted batch size, learning rate decay and learning rate decay step as most influential, where batch size was significantly more influential than other hyper-parameters. Conversely, learning rate did not achieve the influence rank expected based on the literature. Tuning a subset of influential parameters was more efficient than tuning all parameters, which was confirmed in the case study where tuning the top three parameters was quicker and achieved higher accuracy than not only all training parameters but was also a significant improvement on the parameters explored in the original work. The implications of this work for practitioners are that they can use this information to guide hyper-parameter tuning efforts, reducing the parameter search space to work within time and resource constraints without compromising model accuracy. Ultimately, these results facilitate the efficient development of optimal DL models. Furthermore, this work provides a framework and clear methodology that future work in this area can follow. Future directions of this work would focus on expanding the scope with additional model architectures, training datasets, hyper-parameters and performance metrics.
Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |