Systematic benchmarking of climate models: methodologies, applications, and new directions

[thumbnail of Retrospective_Paper_Final_revisions.pdf]
Text
- Accepted Version
· Restricted to Repository staff only
· The Copyright of this document has not been checked yet. This may affect its availability.

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Hassler, B., Hoffman, F. M., Beadling, R., Blockley, E., Huang, B., Lee, J., Lembo, V., Lewis, J., Lu, J., Madaus, L., Malinina, E., Medeiros, B., Pokam, W., Scoccimarro, E. and Swaminathan, R. ORCID: https://orcid.org/0000-0001-5853-2673 (2026) Systematic benchmarking of climate models: methodologies, applications, and new directions. Reviews of Geophysics, 64 (1). e2025RG000891. ISSN 8755-1209 doi: 10.1029/2025RG000891

Abstract/Summary

As climate models become increasingly complex, there is a growing need to comprehensively and systematically assess model performance with respect to observations. Given the increasing number and diversity of climate model simulations in use, the community has moved beyond simple model intercomparison and toward developing methods capable of benchmarking a large number of simulations against a suite of climate metrics. Here, we present a detailed review of evaluation and benchmarking methods and approaches developed in the last decade, focusing primarily on scientific implications for Coupled Model Intercomparison Project (CMIP) simulations and CMIP6 results that contributed to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Based on this review, we explain the resulting contemporary philosophy of model benchmarking, and provide clear distinctions and definitions of the terms model verification, process validation, evaluation, and benchmarking. While significant progress has been made in model development based on systematic evaluation and benchmarking efforts, some climate system biases still remain. The development of open-source community software packages has played a fundamental role in identifying areas of significant model improvement and bias reduction. We review the key features of several software packages that have been commonly used over the past decade to evaluate and benchmark global and regional climate models. Additionally, we discuss best practices for the selection of evaluation and benchmarking metrics and for interpreting the obtained results, the importance of selecting suitable sources of reference data and accurate uncertainty quantification.

Altmetric Badge

Dimensions Badge

Item Type Article
URI https://centaur.reading.ac.uk/id/eprint/128421
Identification Number/DOI 10.1029/2025RG000891
Refereed Yes
Divisions Science > School of Mathematical, Physical and Computational Sciences > National Centre for Earth Observation (NCEO)
Science > School of Mathematical, Physical and Computational Sciences > Department of Meteorology
Publisher American Geophysical Union
Download/View statistics View download statistics for this item

University Staff: Request a correction | Centaur Editors: Update this record