Accessibility navigation

How do I know if my forecasts are better? Using benchmarks in Hydrological ensemble prediction

Pappenberger, F., Ramos, M. H., Cloke, H. L. ORCID:, Wetterhall, F., Alfieri, L., Bogner, K., Mueller, A. and Salamon, P. (2015) How do I know if my forecasts are better? Using benchmarks in Hydrological ensemble prediction. Journal of Hydrology, 522. pp. 697-713. ISSN 0022-1694

Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.

[img] Text - Published Version
· Restricted to Repository staff only
· Available under License Creative Commons Attribution.


It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1016/j.jhydrol.2015.01.024


The skill of a forecast can be assessed by comparing the relative proximity of both the forecast and a benchmark to the observations. Example benchmarks include climatology or a naïve forecast. Hydrological ensemble prediction systems (HEPS) are currently transforming the hydrological forecasting environment but in this new field there is little information to guide researchers and operational forecasters on how benchmarks can be best used to evaluate their probabilistic forecasts. In this study, it is identified that the forecast skill calculated can vary depending on the benchmark selected and that the selection of a benchmark for determining forecasting system skill is sensitive to a number of hydrological and system factors. A benchmark intercomparison experiment is then undertaken using the continuous ranked probability score (CRPS), a reference forecasting system and a suite of 23 different methods to derive benchmarks. The benchmarks are assessed within the operational set-up of the European Flood Awareness System (EFAS) to determine those that are ‘toughest to beat’ and so give the most robust discrimination of forecast skill, particularly for the spatial average fields that EFAS relies upon. Evaluating against an observed discharge proxy the benchmark that has most utility for EFAS and avoids the most naïve skill across different hydrological situations is found to be meteorological persistency. This benchmark uses the latest meteorological observations of precipitation and temperature to drive the hydrological model. Hydrological long term average benchmarks, which are currently used in EFAS, are very easily beaten by the forecasting system and the use of these produces much naïve skill. When decomposed into seasons, the advanced meteorological benchmarks, which make use of meteorological observations from the past 20 years at the same calendar date, have the most skill discrimination. They are also good at discriminating skill in low flows and for all catchment sizes. Simpler meteorological benchmarks are particularly useful for high flows. Recommendations for EFAS are to move to routine use of meteorological persistency, an advanced meteorological benchmark and a simple meteorological benchmark in order to provide a robust evaluation of forecast skill. This work provides the first comprehensive evidence on how benchmarks can be used in evaluation of skill in probabilistic hydrological forecasts and which benchmarks are most useful for skill discrimination and avoidance of naïve skill in a large scale HEPS. It is recommended that all HEPS use the evidence and methodology provided here to evaluate which benchmarks to employ; so forecasters can have trust in their skill evaluation and will have confidence that their forecasts are indeed better.

Item Type:Article
Divisions:Interdisciplinary Research Centres (IDRCs) > Walker Institute
Science > School of Archaeology, Geography and Environmental Science > Department of Geography and Environmental Science
Interdisciplinary centres and themes > Soil Research Centre
Science > School of Mathematical, Physical and Computational Sciences > Department of Meteorology
ID Code:39072
Uncontrolled Keywords:Hydrological ensemble prediction; Forecast performance; Evaluation; Verification; Benchmark; Probabilistic forecasts


Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation