The OASIS Coupler Forum

  HOME

Problem running spoc_communication example

Up to Starting with OASIS3-MCT (first steps, tutorial, ...)

Posted by Anonymous at January 9 2025

Hi,
I'm new to oasis3-mct. I have compiled oasis3_mct version 5.2 with
Apptainer> gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
and openmpi
Apptainer> mpiexec --version
mpiexec (OpenRTE) 4.1.6

I'm trying to run example spoc_communication. I copied the *.F90_oa files to *.F90 and slightly modified run_spoc_oa for my needs:

Apptainer> diff run_spoc_oa run_spoc_oa.ori 
20c20
< arch=apptainer #pgi_openmpi_openmp_linux  # training, belenos, nemo_lenovo, mac 
---
> arch=pgi_openmpi_openmp_linux  # training, belenos, nemo_lenovo, mac 
61,62d60
< elif [ $arch == apptainer ]; then
<     MPIRUN=mpiexec
151c149
< if [ $arch == apptainer ] || [ $arch == training ] || [ $arch == gfortran_openmpi_openmp_linux ] || [ $arch == gnu1020_openmpi_openmp_linux ] || [ $arch == pgi_openmpi_openmp_linux ] || [ $arch == pgi20.4_openmpi_openmp_linux ]; then
---
> if [ $arch == training ] || [ $arch == gfortran_openmpi_openmp_linux ] || [ $arch == gnu1020_openmpi_openmp_linux ] || [ $arch == pgi_openmpi_openmp_linux ] || [ $arch == pgi20.4_openmpi_openmp_linux ]; then

Now I'm getting the following error when running run_spoc_oa:

Apptainer> ./run_spoc_oa
en_US.UTF-8: unknown locale
*****************************************************************
*** spoc_communication : 

Rundir       : /nesi/nobackup/nesi99999/pletzera/tmp/spoc_communication/work_spoc_communication_4_4
Architecture : apptainer
Host         : wbn001
User         : pletzera

ocean runs on 4 processes
atmos runs on 4 processes

Executing the model using mpiexec
 (oasis_unitsetmin)        1024
 (oasis_unitsetmin)        1024
 (oasis_unitsetmax)        9999
 -- ENTER (oasis_string_listGetNum)
...
 ---- EXIT  (oasis_mpi_chkerr)
 -- EXIT  (oasis_mpi_bcastl1)
 (oasis_unitget)        9999
 (oasis_unitget)        9999
 (oasis_unitget)        9999
 (oasis_unitget)        9999
 (oasis_unitget)        9999
 (oasis_unitget)        9999
 (oasis_unitget)        9999
 (oasis_unitget)        9999
 (oasis_abort) ABORT: CALLING ABORT FROM OASIS LAYER NOW
 (oasis_abort) ABORT: See the log files in the run directory
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI COMMUNICATOR 6 SPLIT FROM 4
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
 (oasis_abort) ABORT: CALLING ABORT FROM OASIS LAYER NOW
 (oasis_abort) ABORT: See the log files in the run directory
 (oasis_abort) ABORT: CALLING ABORT FROM OASIS LAYER NOW
 (oasis_abort) ABORT: See the log files in the run directory
[wbn001:121039] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[wbn001:121039] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
spoc_communication is executed or submitted to queue.
Results are found in rundir : /nesi/nobackup/nesi99999/pletzera/tmp/spoc_communication/work_spoc_communication_4_4


I looked into work_spoc_communication_4_4/nout.000000, work_spoc_communication_4_4/atmos.out* and work_spoc_communication_4_4/ocean.out* but could not find any error message.

Apptainer> tail work_spoc_communication_4_4/nout.000000 work_spoc_communication_4_4/atmos.out_100 work_spoc_communication_4_4/ocean.out_100 
==> work_spoc_communication_4_4/nout.000000 <==
 (oasis_init_comp) compnm gather            2 ocean_component T
 (oasis_init_comp) compnm gather            3 ocean_component T
 (oasis_init_comp) compnm gather            4 ocean_component T
 (oasis_init_comp) compnm gather            5 atmos_component T
 (oasis_init_comp) compnm gather            6 atmos_component T
 (oasis_init_comp) compnm gather            7 atmos_component T
 (oasis_init_comp) compnm gather            8 atmos_component T
 (oasis_init_comp)   COUPLED models            1 ocean_component T
 (oasis_init_comp)   COUPLED models            2 atmos_component T
 (oasis_init_comp)cdnam :ocean_component mynummod :           1

==> work_spoc_communication_4_4/atmos.out_100 <==
 -----------------------------------------------------------
 I am atmos process with rank :           0
 in my local communicator gathering            4 processes
 ----------------------------------------------------------
 Local partition definition
 il_extentx, il_extenty, il_size, il_offsetx, il_offsety, il_offset =           96          18        1728           0           0           0
 ig_paral =            1           0        1728
 grid_lat_atmos maximum and minimum  -46.901409149169922       -90.000000000000000     
 var_id FRECVATM, var_id FSENDATM           1           2
 End of initialisation phase

==> work_spoc_communication_4_4/ocean.out_100 <==
 -----------------------------------------------------------
 I am ocean process with rank :           0
 in my local communicator gathering            4 processes
 ----------------------------------------------------------
 Local partition definition
 il_extentx, il_extenty, il_size, il_offsetx, il_offsety, il_offset =          182          37        6734           0           0           0
 ig_paral =            1           0        6734
 grid_lat_ocean maximum and minimum  -50.059176404466399       -78.190584955855201     
 var_id FRECVOCN, var_id FSENDOCN           1           2
 End of initialisation phase



Thanks for your help

Posted by Anonymous at January 9 2025

Hi,

To get oasis debug files, you have to give a value > 0 to the first number on the line below the NLOGPRT keyword in your namcouple. For example, you can rerun with
$NLOGPRT
    30 0
in your namcouple to get the full debug information into files names debug_* .
You should get more information about your abort in those files.
  I hope this helps,
 Sophie
Reply to this