RTS: The FMS Regression Test Suite
This is the home page of the Regression Test Suite of the Flexible Modeling System (FMS). It introduces the RTS and provides the basis for its design. We describe a standard syntax to express an RTS experiment, and provide instructions for constructing a new experiment. We also provide links to the current entries in the RTS.
A printable (PDF) version of this document is available as /home/vb/tex/reports/rts.pdf.
The FMS Regression Test Suite (RTS) is a set of runs designed to assess FMS-based model configurations for correctness and performance. These will be continuously run by Modeling Services and will be used to track performance enhancements as they are delivered. The RTS spans all the model configurations run in production and responsible for much of the throughput at GFDL, In addition, it also includes those model configurations in consideration for future runs (e.g higher resolution). As well, other test configurations such as the solo (Held-Suarez) atmospheric models used for more direct tests of the FMS framework itself are included in the RTS.
Modeling Services is working closely with the GFDL Model Development Teams to ensure that the RTS remains current and correctly reflects the behaviour of FMS models in production, as well as those model configurations actively in consideration for future production runs. To this end, the RTS is closely linked with the Model Development Database.
Users preparing a new experiment may prepare it for inclusion in the databse in the form of an RTS entry as described in xml. This may then be given to the associated Liaison from Modeling Services, who then takes responsibility for performing the RTS integrity tests, verifying that the source has been tagged in manner that guarantees indefinite reproduction, and then creates the entry in the database.
At a certain stage in the evolution of a modeling system, there is a shift in emphasis from development to production, and it becomes necessary to be systematic about tracking and understanding changes. Changes can be of various origins:
The RTS is designed to be the system for delivering objective information about changes in FMS-based model configurations, and maintaining a timeline of this information. An RTS experiment guarantees repeatable, bitwise-identical integrations under controlled conditions defined below. Any changes, including those which produce the ``same climate'' need to be certified by the relevant science teams, and are represented as a new RTS experiment.
An RTS experiment is created from a valid entry in the FMS Model Development Database.
We currently define three sets of RTS integrity tests:
It is possible that the configuration of a model for the peCount test is not identical to the production configuration, where we may choose to use more efficient, but irreproducible, configurations. While this is not recommended, it is noted that this is done in practice in at least one instance: the flag make_exchange_reproduce, controlling the reproducibility of the exchange grid, is set to .TRUE. here and .FALSE. in the standard run below. Where we use non-reproducing optimizations, the cost of the reproducibility option will be documented in the standard run.
Exact matches are checked by running the resdiff script on restart files. These are short runs, typically a few simDays long.
The only performance that matters is the throughput of the actual production run, for which the RTS is merely a proxy. We try to define a run that should hew closely to the actual throughput. Also, the only real measure of throughput is the actual time-to-solution, here defined in units of simYears/wallDay (simulated model years per day of wall clock time). The RTS does however also provide other numbers useful in understanding performance. We define four sets of RTS performance tests:
We note that the future standard runscript for FMS-based model runs will be based on the RTS script, and should be identical to this standard run. We note also that in practice users modify runscripts often, and for excellent reasons. It is worthwhile checking frequently against the RTS standard run to see if one has inadvertently degraded (or enhanced) performance.
The standard run also provides information on the queue wait time associated with production runs. This information will be used to tune the queuing and scheduling on the HPCS.
In addition, we define additional instrumented runs based on performance analysis software. For the SGI platform, there are two types of performance analysis tools: Speedshop and Perfex. These are described in some detail in the SGI notes and in even more copious detail in SGI technical publications.
An RTS experiment is designed to run FMS model configurations under controlled conditions that closely resemble the production environment. We define a standard syntax for the run procedure so that all experiments are run under as similar conditions as possible. We have chosen to use XML as the syntax for describing an RTS experiment.
The HOWTO document describing how to set up an RTS experiment is available online.
The following entries currently exist in the RTS: