Meteorological data reduction for tropical cyclones using deep learning techniquesGalea, D. (2022) Meteorological data reduction for tropical cyclones using deep learning techniques. PhD thesis, University of Reading
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.48683/1926.00108656 Abstract/SummaryTropical cyclones (TCs) are severe weather events which have large human and economic effects, so it is important to be able to understand how their location, frequency and structure might change in future climate. Analysing future changes in TC frequency and location requires the analysis of high-frequency data from simulations of the future. If this is done by saving simulation data to disk for post-processing, it can be very expensive, so finding methods to avoid writing such data is important. This thesis presents a proof-of-concept study showing that deep-learning can be used during model execution to identify TC episodes and only write the data associated with those episodes, leading to reduced data output. (In practice high frequency data might be saved for multiple reasons, but an eventual goal could be to use deep learning and other in-situ analyses to identify all phenomena of interest and hence minimise data output.) A crucial problem in developing any TC detection method is establishing ground-truth, that is labelling the input data with the presence or absence of TCs. The volume of data is such that it is simply impractical to manually label data from simulations, and so the method used here is to initially build a deep learning network using actual TC observations as labels for reanalysis data (which are simulations which attempt to recreate the past) – and train on that reanalysis data. There is an unavoidable problem which arises which is that the reanalysis data itself cannot fully represent reality, and so some of any discrepancy between the deep-learning labels and the ground-truth can arise from the reanalysis process and not a failure of the deep learning. In addition, this method cannot work for the potential future climates where we need ground truth labels to evaluate the deep learning techniques as well, and so another objective technique (TRACK, Hodges et al) is used to provide labels for such data. The influence of errors in labelling which can arise from using TRACK can be evaluated by using TRACK on the same re-analysis data used for establishing the initial deep learning network. As a consequence, the proof of concept study requires three steps: building and evaluating the deep learning network using observations; comparing and contrasting the results with TRACK operating on the same observations; and then introducing the network into a climate model and evaluating the usage of the deep learning across a range of simulations from different climates. The initial deep learning model, named TCDetect, obtained a recall rate of 92% with a precision rate of 33% when developed using ERA Interim reanalyses. This means it was detecting, in ERA Interim data, the vast majority of tropical cyclones present in the original observations. The relatively low precision rate, reflected an emphasis on prioritising detection over rejection, but would still represent a significant data reduction. The comparison with TRACK applied to the same re-analysis showed that both methods detected the strongest well-defined cyclones, those with a clear centre of circulation and the least amount of noise (other weather). However, many weaker cyclones are only detected by one of TCDetect/TRACK, or by neither. While TCDetect was only trained for labelling, an attempt to understand the discrepancy was made which utilised location information; this analysis showed that some of the discrepancies (and in particular some of the false positives) were associated with TCDetect erroneously utilising information from outside the tropics and/or in the wrong place. When integrated into the UK Met Office Unified Model (UM), TCDetect was evaluated using current and future simulated climates at two different horizontal resolutions. For this evaluation, TRACK was used for labelling, and it was found that the version of TCDetect trained on ERA-Interim was not as good as a version trained on the UM output itself with recall values ranging from 41% to 62% for the former and values ranging from 71% to 78% for the latter (although the version trained on the UM data performed adequately on the ERA-interim data). It was also found that the version trained on low resolution can be used on higher-resolution data (recall rate of 78% on low resolution data vs 71% on high resolution data). Importantly, it was found that the version trained on current climate performed well on future climates. This could have been helped by the fact that the input variables are climateinvariant. The effect of a change of labelling source was also discussed. The reduction in the volume of analysis data saved to disk achieved was shown to be around 73% of the original data volume. Finally, the method was shown to slow down a coarse resolution (N96) simulation by around 25% and a finer resolution (N512) simulation by around 5%, while noting than many optimisations could be achieved.
Download Statistics DownloadsDownloads per month over past year Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |