Posted on May 13th, 2015 in Isaac Held's Blog
Fig. 9.8 from the AR5 Working Group 1 IPCC report. Global mean surface temperatures simulated by a set of climate models, shown as anomalies from the time mean over a reference period 1961-1990. Observations (HADCRUT4) in black; ensemble mean in red. On the right (circled) are the mean temperatures in the reference period.
Ch.9 of the AR5-WG1 report, “Evaluating Climate Models” is, in my opinion, the most difficult to write of any chapter in that report. You can think of hundreds if not thousands of interesting ways of comparing modern climate models to observations, but which of these is the most relevant for judging the quality of a projection for a particular aspect of climate change over the next century? This is an important research problem. Consider this figure, which shows the familiar simulated changes in global mean surface temperature over the past 150 years, in a set of models deposited in the CMIP5 archive, as anomalies from the model’s own temperature during some reference period (shaded). But the figure also shows in the narrow panel on the right side, circled in red, the models’ mean temperatures during that reference period. People tend to be disappointed when they see this — some models are better than others but the biases in the model’s global mean temperature are typically comparable to the 20th century warming and in some cases larger. If we are interested in projections of global mean warming over the coming century, or in the attribution of this past warming, should we trust these models at all, given these biases?
To start off, I would claim that it cannot be a valid requirement in general that the bias in some quantity needs to be small compared to the change that we are trying to predict or understand. Suppose we are interested in the forced response of global mean temperature to an increase of 10% or just 1% in CO2 rather than a 100% increase. Do biases in the models that we use for this purpose have to be 10 or 100 times smaller in order to trust their responses to these smaller perturbations? This makes no sense to me. I happen to think that these responses are quite linear over this range, in which case the size of the perturbation obviously has little relevance. But I am hard pressed to imagine any picture in which the bias in global mean temperature would have to be smaller than 0.01C to justify using a model to study global mean temperature responses. (The difficulty of studying very small responses in the presence of internally generated variability is a different issue entirely — if you were really interested in the response to such a small perturbation in a model for theoretical reasons, perhaps to test for linearity, you would have to generate a very large ensemble of simulations to average out the internal variability.)
On the other hand, consider sea ice extent. If your simulation is way off, it’s going to be hard to simulate the retreat of sea ice quantitatively — interactions between sea ice and the ocean circulation are likely seriously distorted due to the complexity of the ocean basin geometry. Plus, too extensive sea ice, say, would put it in regions of more incident solar flux, affecting the strength of albedo feedback. Sea ice issues are likely to be nonlinear in the sense that the mean state that you are perturbing matters a lot.
Why don’t models do better on the mean temperature? I think it is fair to say that all climate models have parameters in their cloud/convection schemes which which they tune their energy balance. This optimization step is necessary because cloud simulations simply are not good enough to get an energy balance to less than 1 W/m2 from first principles. Is it that some modeling groups are not very good at tuning their models?
One possibility is that there are tradeoffs between optimizing global mean temperature and some other aspect of the simulation. Imagine that a model has a bias in its pole-to-equator temperature gradient and that it is easier to adjust the models mean temperature up and down, with some parameter in the cloud scheme perhaps, than to correct this bias in the gradient. The result might be a choice between optimizing the global mean temperature and the sea ice extent. How would you weigh the importance of the bias in sea ice extent vs the bias in global mean temperature? I would probably give more weight to the ice because that is where the sensitivity to the mean state is likely to be stronger.
But this kind of explicit tradeoff is probably not the dominant reason for the bias in global mean temperature in most models. It is more likely that the models cloud schemes have been tuned to get a good temperature using relatively short runs of the model and then when one does longer multi-century integrations the model drifts — and it may be too expensive to iterate the model using these long integrations. So you live with the bias resulting from this slow drift.
Rather than the simulation of the climatology, why not use simulated trends in some quantity of interest as the metric with which to judge the credibility of model projections of that same quantity? If you are confident that the observed trend can be attributed to known forcings this is fine, but the familiar issue with uncertain aerosol forcing and uncertain contribution from internal variability makes this problematic for the global mean temperature, and the same issues arise for other quantities. The more credible and quantiative the attribution claim, the more valuable observed trends are for model evaluation.
Different views on the relative importance of different metrics are partly responsible for divergence between models. Are you better off with an optimized simulation of top-of-atmosphere spatial patterns of incoming and outgoing radiative fluxes, or of precipitation patterns? What if a proposed change in a model improves Amazon precipitation but causes African rainfall to deteriorate? If you are interested in how ENSO may evolves in the future under different emission scenarios, is it better to use a metric based on the quality of ENSO in simulations of the past century, or is it better to to use the same model to make seasonal forecasts of ENSO and use the skill of those forecasts as a metric? (Of course seasonal forecast skill is important in its own right — but its value as a metric compared to other possible metrics for a model of climate change is less self-evident.) If our models were close enough to nature, it would not matter which metric we used to push them even closer because all metrics would give a consistent picture. That different metrics agree or disagree on which of two versions of a model is better is itself a hint of how far one is from a fully satisfactory simulation.
Rather than defining metrics in a subjective way, basically guessing which metrics are most important, you can ask which metrics matter for a particular projection (ie of Sahel rainfall). If I sort models using some metric, some way of comparing the model to observations, does this also discriminate between model projections (i.e. between a dry Sahel in the future or a wet Sahel)? If so, and if I believe that this connection is physical, I can use it to sharpen my projection using that model ensemble. If there is no correlation between the metric and the projections, even if you were convinced that the metric was relevant, there would be no direct way of using it together with that ensemble of model results to improve the projection. This approach, sometimes referred to as looking for emergent constraints, strikes me as the most promising for the design of useful metrics. Returning to the figure at the top, should you use the value of a model’s mean bias to weight that model’s contribution, within a model ensemble, to future projections of global mean temperature? I don’t thinks so. The bias is not strongly correlated with the projected temperature change (see Fig. 9.42 here).
[The views expressed on this blog are in no sense official positions of the Geophysical Fluid Dynamics Laboratory, the National Oceanic and Atmospheric Administration, or the Department of Commerce.]