Semi-automatic assessment of I/O behavior by inspecting the individual client-node timelines— an explorative study on 10^6 jobsBetke, E. and Kunkel, J. (2020) Semi-automatic assessment of I/O behavior by inspecting the individual client-node timelines— an explorative study on 10^6 jobs. In: ISC HPC, 21-25 Jun 2020, Frankfurt, Germany, pp. 166-184, https://doi.org/10.1007/978-3-030-50743-5_9.
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.1007/978-3-030-50743-5_9 Abstract/SummaryHPC applications with suboptimal I/O behavior interfere with well-behaving applications and lead to increased application runtime. In some cases, this may even lead to unresponsive systems and unfinished jobs. HPC monitoring systems can aid users and support staff to identify problematic behavior and support optimization of problematic applications. The key issue is how to identify relevant applications? A profile of an application doesn’t allow to identify problematic phases during the execution but tracing of each individual I/O is too invasive. In this work, we split the execution into segments, i.e., windows of fixed size and analyze profiles of them. We develop three I/O metrics to identify three relevant classes of inefficient I/O behaviors, and evaluate them on raw data of 1,000,000 jobs on the supercomputer Mistral. The advantages of our method is that temporal information about I/O activities during job runtime is preserved to some extent and can be used to identify phases of inefficient I/O. The main contribution of this work is the segmentation of time series and computation of metrics (Job-I/O-Utilization, Job-I/O-Problem-Time, and Job-I/O-Balance) that are effective to identify problematic I/O phases and jobs.
Download Statistics DownloadsDownloads per month over past year Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |