Here are some questions and answers about memory window and related issues:
There are 12 arrays required to hold three time levels of prognostic data for two horizontal velocity components and two tracers within the memory window. The arrays are dimensioned as A(imt,km,jmw). For the simplest options, the workspace part of the memory window must allow space for 3 advective velocities on the faces of T-cells, 3 advective velocities on the faces of U-cells, 3 temporary arrays for use as fluxes on cell faces, 1 array for density, and 2 arrays for land/sea masks. If it is assumed that all of these 12 arrays in the workspace part are dimensioned as above, then 50% of the space within the memory window is workspace and 50 assumption when the memory window is fully opened (jmw=jmt for second order numerics). The percent of space taken by workspace arrays is significantly less when the window is minimum (jmw=3). But the minimum window is not the constraining configuration.
Refer to Fig 11.8a which is a typical situation when MOM 3 is configured to solve a problem which is too big to fit into memory on a conventional vector platform such as the CRAY T90 at GFDL. All latitude rows are placed on disk. This is viable only if the disk is a solid state device such as SSD on the CRAY11.10. The option for keeping data on solid state disk on the CRAY T90 is ssread_sswrite.
Figure 11.8b indicates what happens when option ramdrive is used instead of option ssread_sswrite. This option is appropriate for platforms that do not have solid state disk. Note that in comparison to figure 11.8a, the memory has increased dramatically and no disk space is required. The ramdrive is just an array dimensioned large enough to hold all latitude rows. Reads and writes to the ramdrive are replaced by memory to memory copies.
Figure 11.8c is the situation when the memory window is fully opened up. The options for this configuration is radmrive and max_window. In this case, all latitude rows are stored directly in the memory window. For large models with many latitude rows, the fully opened memory window takes three times as much memory space as the case in figure 11.8b. For more detail, refer to Section 37.1.
Yes and no. When fully opened, the number of latitude rows in the memory window is given by jmw=(jmt-2) + 2*jbuf. Consider the case of one processor: global arrays are dimensioned as A(imt,1:jmt) and memory window arrays are dimensioned as B(imt,km,1:jmw). For a second order window (when equations require second order numerics), jbuf=1 so jmw=jmt and jrow=1 references the same latitude row as j=1 in the memory window array.
However, the same is not true for a fourth order window. The reason is that jbuf=2 which makes the memory window arrays dimensioned as B(imt,km,1:jmt+2) while global arrays are still dimensioned as A(imt,1:jmt). The relationship between indices is such that jrow=1 in the global arrays references the same latitude row as j=2 in the memory window arrays.
To make the local and global latitude indices the same for all cases, the memory window arrays would have to be made allocatable and dimensioned as B(imt,km,jscomp-2*jbuf:jecomp+2*jbuf) where ``jscomp'' and ``jecomp'' are a function of processor. Allocating these arrays as such currently slows the model down by 30%.
Yes. All ``j-loops'' could be replaced by one large ``j-loop'' wrapped around all routines needed to compute internal modes and tracers. However, there are problems with this technique when message passing on multiple processors. Essentially, the computation becomes ``serialized'' when processor ``n'' must wait for adjacent processor ``n-1'' to finish before inter-processor communication can be used to extract correct quantities at domain boundaries. This happens, for instance, when quantities are computed incorrectly at domain boundaries because the proper inputs are unavailable. The standard solution is to replace incorrectly computed quantities with correct ones from adjacent domains via inter-processor communication. Alternatively, the memory window can be bumped up to the next higher order to include more buffer rows. Another problem with using the single ``j-loop'' technique is the arrangement is not flexible enough to cover all possibilities. For instance, option pressure_gradient_average which requires tracers to be solved to the north of velocity cells before baroclinic velocities are solved is problematic.
Yes. If a new set of indices ``jm1,jc,jp1'' are introduced to refer to rows ``j-1,j,j+1'' in the memory window, then instead of moving data, the new indices can be rotated. This technique would also save memory. There are two disadvantages: for higher order schemes, new indices need to be introduced (i.e. ``jm2'' and ``jp2'' for forth order schemes); and the scheme is prone to errors. For instance, if an array is referenced with ``j+1'' instead of ``jp1'', then errors result.