STATISTICS & DATA ANALYSIS TECHNIQUES
Introduction
A scientist must call upon different techniques in order to carry out
research. One of the main tools used at GFDL is that of the physical
(mathematical) model, for example a GCM, used to simulate the physical system
using the basic laws of physics. Another approach, which is of particular
interest to me, is to use statistical tools, based on a different set of laws,
to examine data. While meteorologists are in general trained in basic
traditional statistics there is a wealth of statistical knowledge which the
meteorological community has yet to tap. I try to keep abreast of the
developments in the field of statistics (hindered by my limited background)
in the hope that I will stumble upon something useful.
Eigenvector Analysis
One of my main specialties in this area is the use of eigenvector analysis
techniques which encompasses Principal Components/Empirical Orthogonal
Functional Analysis. In some earlier projects which were predictive in nature
(Harnack et al., 1982a
Harnack and Lanzante,1984;
Harnack and Lanzante,1985;
Harnack et al., 1986a/b/c)
EOF's were used to extract the major signals
in oceanic and atmospheric fields. These techniques are also quite useful in
diagnostic studies as well
(Harnack et al., 1982a;
Lanzante, 1984;
Lanzante and Harnack, 1984;
Lanzante, 1990;
Lanzante, 1991;
Lanzante, 1996).
Over the course of these studies the virtues and nuances of rotation of
the eigenvectors have been realized.
In my study of the relationships between the atmospheric circulation in the western hemisphere and sea surface temperatures in the North Pacific and North Atlantic I employed a variation on EOF analysis which was the forerunner of SVD (Singular Value Decomposition) which has gained popularity in recent years. I have also found complex eigenvector analysis to be quite powerful in examining phenomena which propagate or evolve over their lifetime (Lanzante, 1990, 1991, 1996).
Resampling & Monte Carlo Simulation
Screening multiple linear regression was also used extensively in the
predictive studies cited in the previous paragraph and was the focus of
Lanzante (1984a)
in which model building and the assessment of skill and
significance was addressed. The use of resampling (jackknife, bootstrap,
cross-validation) and other Monte Carlo approaches have been used in most
of my publications. It is often the case in the study of climate that the
statistical significance of a particular test can not be assessed using
a traditional approach because the data are not independent in time and/or
space or because it has been necessary to embark on a "statistical fishing
expedition" in a small sample of data. These Monte Carlo strategies can
come to the aid by substituting raw computing power for certain statistical
assumptions. Of course one must employ careful thought to the design of
these Monte Carlo schemes -- each problem may require a somewhat different
approach.
Resistant, Robust & Nonparametric Techniques
Over the course of the last several years I have grappled with the issue of
quality control (of radiosonde data) and with the general problem of the
analysis of "messy data". By this I mean data which are contaminated by
outliers or are not Normally distributed. The problem is that these "defects"
in the data can render invalid most of the common statistical techniques
which meteorologists regularly employ. As it turns out statisticians have
alternative techniques which are not much affected by these defects.
In a review article which I have written on this subject, I present some such
alternatives to the mean, standard deviation, t-test, correlation, regression,
and some other commonly employed statistical measures. Some of these
alternatives have been adopted for quality control in NCDC's
Global Historical Climatology Network (GHCN), a collection of monthly
land surface observations from several thousand stations. The GHCN is a WMO
global baseline data set which has wide use in the study of climate variability.
Similarly some of these methods have been employed in the creation of a
Spectral Analysis
During the early 90's I was introduced to a relatively new approach to
spectral analysis known as Multitaper Spectral Analysis. This approach was
devised by David Thomson of AT & T Bell Labs during the early 1980's and is
clearly documented in the text by Walden and Percival (1993),"Spectral
Analysis For Physical Applications: Multitaper and Conventional Univariate
Techniques". Traditional spectral analysis is performed by applying a single
taper to the data, followed by fourier transformation which yields periodogram
estimates which are then smoothed using a window. The tapering is aimed at
reducing leakage which leads to bias of the estimated spectrum. The windowing
is aimed at reducing the variance of the estimated spectrum but at the cost
of reduced frequency resolution. By contrast, multitaper spectral analysis
performs tapering using more than one taper and then the separate periodogram
estimates (from each taper) are averaged instead of windowing. The result is
that for a given bandwidth the multitaper method generally produces a better
combination of low bias and low variance than the traditional method, or
conversely, for a given bias and variance the multitaper resolution is
greater. In my future work I plan to utilize the multitaper approach when
spectral and cross-spectral analysis is appropriate..
Other
In the course of studying climate variability and change sometimes it is
necessary to deal with statistical issues that arise. In one instance we
developed an approach to studying the nature of climate change, by comparing
linear and nonlinear measures of change
(Seidel and Lanzante, 2004).
In another instance I was motivated by comments made during the course of
a climate workshop. Subseqent examination of the recent literature confirmed my
suspicions that there is some widespread misunderstanding regarding the use of
"error bars" in the analysis of climate data. This was pointed out along with
some simple illustrative examples in a brief correspondence
(Lanzante, 2005).
Plans
I continue my regular perusal of a number of statistical journals for the
latest developments. The Resistant, Robust & Nonparametric Techniques which I
have recently discovered are being employed to a considerable degree in my
ongoing research dealing with climate data quality and long term climate
variability as well as in studies of short-term climate variability.
Return to John Lanzante's Home Page.
