Towards improving the framework for probabilistic forecast evaluation

Smith, Leonard A.; Suckling, Emma B; Thompson, Erica L.; Maynard, Trevor; Du, Hailiang

Download

[thumbnail of art%3A10.1007%2Fs10584-015-1430-2.pdf]

Preview

Text
- Published Version
· Available under License Creative Commons Attribution.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Smith, L. A., Suckling, E. B., Thompson, E. L., Maynard, T. and Du, H. (2015) Towards improving the framework for probabilistic forecast evaluation. Climatic Change, 132 (1). pp. 31-45. ISSN 1573-1480 doi: 10.1007/s10584-015-1430-2

Abstract/Summary

The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/40860
Identification Number/DOI	10.1007/s10584-015-1430-2
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > NCAS Science > School of Mathematical, Physical and Computational Sciences > Department of Meteorology
Publisher	Springer
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	03 Aug 2015 19:32	Date item deposited into CentAUR
Last Modified:	16 Jul 2025 08:19	Date item last modified