next up previous contents
Next: 12.5 Domain Decomposition Up: 12. Multi-tasking Previous: 12.3 Approaches to multi-tasking

   
12.4 The distributed memory paradigm

A distributed memory paradigm is one where each processor has its own chunk of memory but the memory is not shared among processors. This means that an array on one processor cannot be dimensioned larger than the processor's local memory and the processor cannot easily access arrays dimensioned in other processors memory. Accessing arrays in other processor's memory is possible by making ``communication'' calls to transfer data between processors.

MOM uses a distributed memory paradigm. The method builds on the coarse grained approach and assumes that both baroclinic and barotropic parts are divided among processors with distributed memories and therefore ``communication'' calls must be added to exchange boundary cells between processors at critical points within the code. The ``communication'' calls are made via a message passing module which supports SHMEM as well as MPI protocols. For details, refer to http:/www.gfdl.gov/ vb. The advantages of the distributed paradigm are:

1.
Higher parallel efficiency is attainable.

2.
No complicated ``ifdef'' structure is needed to partition code differently for uni-tasking or multi-tasking on shared or distributed memory platforms.

3.
Only two time levels are required for 3-D prognostic data on disk (or ramdrive) as opposed to three time levels with the coarse grained shared memory method.

Most options have been tested in parallel. It bears stating that when an option is parallelized, it means that answers are the same to the last bit of machine precision regardless of the number of processors used12.2. Some options will probably never be parallelized. One example is the stream function method. This method requires land mass perimeter integrals which cut across processors in compilicated ways and this is a recipe for poor scaling. The implicit free surface method is better since it does not require perimeter integrals. However, global sums are still required which do degrade scaling but not as much as the island integrals. Improvement can be made to the existing global sum reductions because they are only a crude first attempt. However, even if global sum reductions were no problem, the accuracy of the method depends on the number of iterations which is tied to the grid size12.3. The best scaling is achieved by the explicit free surface option which does not require any global sum reductions and the number of iterations (sub-cycles) depends on the ratio of internal to external gravity wave speed independent of the number of grid cells.


next up previous contents
Next: 12.5 Domain Decomposition Up: 12. Multi-tasking Previous: 12.3 Approaches to multi-tasking
RC Pacanowski and SM Griffies, GFDL, Jan 2000