2 First section of namcouple file
The first section of namcouple uses some predefined keywords
prefixed by the $ sign to locate the related information. The $ sign
must be the first non-blank character on the line but can be in any column.
8 keywords are used by OASIS3-MCT and 5 of these are optional :
- $NFIELDS: On the line below this keyword, put a number equal (or greater) to the total
number of field entries in the second part of the namcouple. If more than one field are described on the same line, this
counts as only one entry.
- $RUNTIME: On the line below this keyword, put the total
simulated time of the run, expressed in seconds (or any other time
units as long as the same are used in all components and in the namcouple, see 2.2.7). Note that by convention the first coupling of a run occurs at the beginning of the run and all other couplings occur at a time strictly smaller than $RUNTIME.
- $NLOGPRT: The first, second and third numbers on the line below
this keyword refer to (i) the debug verbosity, (ii) internal timing statistics
and (iii) component load balancing analysis. The information is written by OASIS3-MCT
for each component and (optionnally) for each process.
The first number (that can be modified at runtime with the oasis_set_debug routine, see section
2.2.9) may be:
The second number defines how time statistics are written out to
file comp_name.timers_xxxx (with comp_name being the component name, see section 2.2.2); it can be:
- 0 : production mode. One file debug.root.xx is open by the master process of
each component and one file debug_notroot.xx is open for all the
other processes of each component to write only error information.
- 1 : one file debug.root.xx is open by the master process of
each component to write information equivalent to level 10 (see
below) and also to write memory usage information;
one file debug_notroot.xx is open for all the other
processes of each component to write error information.
- 2 : one file debug.yy.xxxxxx is open by each process of each
component (with “yy” being the component number and “xxxxxx” the process number)
to write normal production diagnostics and memory usage information
- 5 : as for 2 with in addition some initial debug info
- 10: as for 5 with in addition the routine calling tree
- 12: as for 10 with in addition some routine calling notes
- 15: as for 12 with even more debug diagnostics
- 20: as for 15 with in addition some extra runtime analysis
- 30: full debug information
For more information on the time statistics written out, see section
- 0 : nothing is calculated or written.
- 1 : some time statistics are calculated and written in a
single file by the processor 0 as well as the min and the max
times over all the processors.
- 2 : some time statistics are calculated and each processor
writes its own file ; processor 0 also writes the min and the max
times over all the processors in its file.
- 3 : some time statistics are calculated and each processor
writes its own file ; processor 0 also writes in its file the min
and the max times over all processors and also writes in its file
all the results for each processor.
The third number (new in OASIS3-MCT_5.0) can be set to 1 to activate a load balancing diagnostic.
An efficient use of the allocated computing resources in a coupled system requires the harmonisation of the components speed. This operation, called load balancing, is often neglected, either because of the apparent resource abundance and practical difficulties.
To facilitates this work, OASIS3-MCT can output the full timeline of all coupling related events, for any of the allocated resources. This timeline is saved in one netCDF file per coupled component (timeline_XXX_component.nc). It provides the comprehensive sequence of any operations related to the coupling (field exchange through MPI, field output on disk, field interpolation and mapping, field reading on disk, restart writing, initialisation and termination phase of the OASIS3-MCT setup) so that any simulation slow down in link with the use of the OASIS library can be identified.
The analysis of the coupling field exchanges, amongst all the
coupling events, allows not only to identify the resources waste of components which are recurrently waiting for their coupling fields but it also reveals other bottlenecks such as disk access, OS interruptions or model internal load imbalance. The full picture of these events makes possible an optimum load balancing, even for the most complex configurations.
For a detailed information on load balancing analysis and timeline visualisation see respectively (Maisonnave et al 2020) and in (Piacentini and Maisonnave 2020).
In addition to the timeline, computing information (time to solution, speed, cost) and a synthesis of the time spent on MPI routines for each coupled component can also help, in the simpler cases, to allocate resources in a balanced way ( see file load_balancing_info.txt ).
- $NCDFTYP: Optional (new in OASIS3-MCT_5.0); on the line below
this keyword is a character string that indicates the NetCDF file type
for all (i.e. mapping, restart, output) new NetCDF files generated by OASIS3.
The options are cdf1,
cdf2, and cdf5. The mode cdf1 is also known as classic mode, cdf2 as large file format or 64bit_offset and supports larger
files, cdf5 as 64bit_data and supports both larger files
and larger variables. More information about these file types can
be found in NetCDF documentation. Because cdf5 may not be generally available in
all NetCDF installations, use of this option requires that the C preprocessor
directive CDF_64BIT_DATA be used when compiling OASIS3-MCT. If that
preprocessor is not used, cdf5 is not a valid option.
The file format for any NetCDF file can be diagnosed by using “ncdump -k filename”.
$NCDFTYP only affects new files created by OASIS3-MCT. NetCDF will read and/or
overwrite existing files of any NetCDF file type, and the file type will remain
unchanged in that case.
- $NUNITNO: Optional (new in OASIS3-MCT_4.0); on the line below this keyword are two integers
that indicate the minimum and maximum unit numbers to be used for
input and output files in the coupling layer. The user should
choose values that will NOT conflict or overlap with unit numbers in
use in any of the component models. The defaults are 1024 for the minimum and 9999
for the maximum unit number if not explicitly set by the user.
- $NMAPDEC: Optional (new in OASIS3-MCT_4.0); on the line below this keyword is a character string
that indicates the mapping decomposition value to be used during local mapping. The
options are decomp_1d and decomp_wghtfile. Option decomp_1d decomposes the grid in a simple
one dimensional way while decomp_wghtfile decomposes the grid using the
information in the remapping weight file to reduced mapping communication. Option decomp_wghtfile
will take some extra time in initialization but it should result in faster mapping.
The default is decomp_1d but it is recommended to test decomp_wghtfile to see if that
option improves performance. More details can be found in (Craig et al 2018) and in (Valcke et al 2018).
- $NMATXRD: Optional (new in OASIS3-MCT_4.0); on the line below this keyword is a character string
that indicates the method used to read remapping weights. There are two options, orig
and ceg. In both, the weights are read in chunks by the root process. In the orig option,
the weights are then broadcasted to all processes and each process then saves the weights needed in
order to be consistent with the mapping decomposition. In the ceg option, the root process
reads the weights and then decides which process each weight should be assigned to. A
series of exchanges are then done and just the weights needed on
each process are sent. The orig method sends much more data but is more parallel. The ceg
method does most of the work on the root process but less data is communicated. The default option
is ceg. More details can be found in (Craig et al 2018).
- $NWGTOPT : Optional (new in OASIS3-MCT_4.0); on the line below this keyword is a character string
that indicates how to handle bad remapping weights. There are four options
abort_on_bad_index, ignore_bad_index, ignore_bad_index_silently, and
use_bad_index. Bad weights are defined as weights in the mapping file for which either
the source or destination index are out of bounds relative to the number of grid cells
in the grid; in that case, the weight is referencing a gridcell that does not physically
exist. Note that an index equal to zero will not be considered as a bad index if the associated weight
is also zero. There are other situations where the value of the actual mapping weight is
scientifically incorrect, but this is not easy to detect and is not dealt with in OASIS3-MCT.
- abort_on_bad_index will write error messages to the log files and abort if a bad weight
index is detected. This is the default option.
- ignore_bad_index will write an error message and then remove bad
weights internally before continuing.
- ignore_bad_index_silently will remove bad weights and continue without writing an error
- use_bad_index will attempt to keep bad weights in the interpolation computation,
but this can result in memory corruption, silent dropping of weights, and incorrect results ; this is not recommended.
Note that the ability to check mapping files at runtime in OASIS3-MCT is limited. It is always
recommended that mapping files be analyzed offline before long production runs are carried out.
Checks can be done to make sure the source and destination indices are valid, that weights values
are reasonable (for instance, between 0 and 1, although this will depend on the mapping method),
and that the sum of weights on the destination cells are reasonable (for instance, 1, in many cases).
In addition, offline tests can be run with analytical functions to verify conservation, gradient
preserving features and other characteristics associated with the particular mapping approach.
- $NNOREST: Optional (new in OASIS3-MCT_4.0); on the line below this keyword is a character
string that can override the requirement that restart files must exist
if they are needed. If the character string value starts with T, t, .T,
or .t (as in true), then OASIS3-MCT will initialise with zero any variable that normally requires
a restart (for instance, variables with LAG 0) if the restart file does not exist. By default, missing
restart files will cause the model to abort. It is strongly recommended
that this keyword NOT be used in production runs. It exists to provide a
quick shortcut for running technical tests.
Note that if $NNOREST is true but the restart file nonetheless exists, it will be used.
- Keywords $SEQMODE, $CHANNEL, $JOBNAME, $NBMODEL, $INIDATE, $MODINFO, $CALTYPE are not used anymore.