To integrate the equations detailed in Chapter 4, a volume of ocean is divided into a large number of rectangularly shaped cells within which equations are solved by finite difference techniques. Storage for each variable in addition to other quantities such as fluxes through cell faces must be allocated for each cell. As noted above, if storage were allocated entirely within memory, the maximum attainable resolution on a single processor would be severely limited. This restriction can be greatly relaxed by allocating the bulk of storage on a secondary device such as disk (preferably solid state) and allocating only enough memory to integrate equations for one slice through the ocean's volume at a time. Successively reading slices from disk into memory, integrating, and writing updated slices back to disk11.5 allows equations to be solved for the entire volume of ocean with significantly less memory in comparison to total storage requirements. This is the essence of a memory window which is described in more detail below. For models with domain sizes which are small compared to available memory, or on architectures where disk access may be prohibitively slow, an option is available (Section 24.3) to turn the disk area into a ramdrive.
There are various ways to slice through a volume of data. As an
example, refer to Figure 11.1a which illustrates a three
dimensional block of data on disk arranged such that there are
longitudes,
latitudes, and
depth levels. As indicated, slicing the volume of data
along surfaces of constant longitude, latitude, and depth requires that
when brought into central memory, the arrays needed to hold the slices
be of size
,
,
or
.
Perhaps the most intuitive way of dimensioning arrays
is Array(imt,jmt). However, in general, the number of model latitudes
jmt is usually comparable (within a factor of 1/2) to the number of
longitudes imt but the number of depth levels km is typically 1/5
to 1/10 the number of longitudes. Therfore, dimensioning slices as
Array(imt,jmt) can be eliminated as being too wasteful of memory. Even
though dimensioning slices as Array(jmt,km) requires the least space,
it is less favorable based on other considerations: chiefly, the added
cost of more vector startups (discussed below) and the desire to easily
reference longitudinal sections along circles of constant latitude
(i.e. for diagnostic purposes and polar filtering). Based on the
above, slices are dimensioned as Array(imt,km).
The reason that slices are dimensioned as Array(imt,km) instead of as Array(km,imt) is largely historical and again based on speed issues: the inner dimension is the vector dimension in Fortran and longer vectors execute faster than shorter ones. In addition, vector startup time can be significant for short vectors. Therefore the idea is to have long vectors and as few of them as possible11.6 on vector computers. Essentially, these ideas led to dimensioning slices as Array(imt,km) to represents a longitudinal section of data along a circle of constant latitude. In general, more than one latitude row is needed in memory to solve equations and so arrays are dimensioned by a local latitude index as in Array(imt,km,jmw) where the number of local latitudes is typically jmw=3.
On cache (fast on-chip memory) processors of the type seen in distributed platforms, the size of cache has a significant impact on speed. Smaller sized arrays are more likely to fit within available cache than larger sized arrays and this results in speed improvements. In fact, when multi-tasking, fitting arrays into available cache is the reason for observed super linear speed ups as the number of processors are increased. In this case, vectors are not as important as fitting arrays into cache. However, performance is not as simple as just cache size. On-chip bottlenecks can occur for a variety of reasons which are beyond the scope of this documentation. Regardless of chip design details, the focus should be to write code which is scientifically clear and general enough to stand through time. Incorporating code to correct a compiler's shortcommings is to be discouraged since the coding quirk is likely to survive long after the compiler is gone. If carried to excess, accumulation of antiquated ``speed ups'' serve only to obscure the science behind the code. In the long run, faster integrations are the result of more appropriate numerical techniques, better optimizing compilers, and faster computers.