Up to Starting with OASIS3-MCT (first steps, tutorial, ...)
Hi, I'm new to oasis3-mct. I have compiled oasis3_mct version 5.2 with Apptainer> gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 and openmpi Apptainer> mpiexec --version mpiexec (OpenRTE) 4.1.6 I'm trying to run example spoc_communication. I copied the *.F90_oa files to *.F90 and slightly modified run_spoc_oa for my needs: Apptainer> diff run_spoc_oa run_spoc_oa.ori 20c20 < arch=apptainer #pgi_openmpi_openmp_linux # training, belenos, nemo_lenovo, mac --- > arch=pgi_openmpi_openmp_linux # training, belenos, nemo_lenovo, mac 61,62d60 < elif [ $arch == apptainer ]; then < MPIRUN=mpiexec 151c149 < if [ $arch == apptainer ] || [ $arch == training ] || [ $arch == gfortran_openmpi_openmp_linux ] || [ $arch == gnu1020_openmpi_openmp_linux ] || [ $arch == pgi_openmpi_openmp_linux ] || [ $arch == pgi20.4_openmpi_openmp_linux ]; then --- > if [ $arch == training ] || [ $arch == gfortran_openmpi_openmp_linux ] || [ $arch == gnu1020_openmpi_openmp_linux ] || [ $arch == pgi_openmpi_openmp_linux ] || [ $arch == pgi20.4_openmpi_openmp_linux ]; then Now I'm getting the following error when running run_spoc_oa: Apptainer> ./run_spoc_oa en_US.UTF-8: unknown locale ***************************************************************** *** spoc_communication : Rundir : /nesi/nobackup/nesi99999/pletzera/tmp/spoc_communication/work_spoc_communication_4_4 Architecture : apptainer Host : wbn001 User : pletzera ocean runs on 4 processes atmos runs on 4 processes Executing the model using mpiexec (oasis_unitsetmin) 1024 (oasis_unitsetmin) 1024 (oasis_unitsetmax) 9999 -- ENTER (oasis_string_listGetNum) ... ---- EXIT (oasis_mpi_chkerr) -- EXIT (oasis_mpi_bcastl1) (oasis_unitget) 9999 (oasis_unitget) 9999 (oasis_unitget) 9999 (oasis_unitget) 9999 (oasis_unitget) 9999 (oasis_unitget) 9999 (oasis_unitget) 9999 (oasis_unitget) 9999 (oasis_abort) ABORT: CALLING ABORT FROM OASIS LAYER NOW (oasis_abort) ABORT: See the log files in the run directory -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 3 in communicator MPI COMMUNICATOR 6 SPLIT FROM 4 with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- (oasis_abort) ABORT: CALLING ABORT FROM OASIS LAYER NOW (oasis_abort) ABORT: See the log files in the run directory (oasis_abort) ABORT: CALLING ABORT FROM OASIS LAYER NOW (oasis_abort) ABORT: See the log files in the run directory [wbn001:121039] 2 more processes have sent help message help-mpi-api.txt / mpi-abort [wbn001:121039] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages spoc_communication is executed or submitted to queue. Results are found in rundir : /nesi/nobackup/nesi99999/pletzera/tmp/spoc_communication/work_spoc_communication_4_4 I looked into work_spoc_communication_4_4/nout.000000, work_spoc_communication_4_4/atmos.out* and work_spoc_communication_4_4/ocean.out* but could not find any error message. Apptainer> tail work_spoc_communication_4_4/nout.000000 work_spoc_communication_4_4/atmos.out_100 work_spoc_communication_4_4/ocean.out_100 ==> work_spoc_communication_4_4/nout.000000 <== (oasis_init_comp) compnm gather 2 ocean_component T (oasis_init_comp) compnm gather 3 ocean_component T (oasis_init_comp) compnm gather 4 ocean_component T (oasis_init_comp) compnm gather 5 atmos_component T (oasis_init_comp) compnm gather 6 atmos_component T (oasis_init_comp) compnm gather 7 atmos_component T (oasis_init_comp) compnm gather 8 atmos_component T (oasis_init_comp) COUPLED models 1 ocean_component T (oasis_init_comp) COUPLED models 2 atmos_component T (oasis_init_comp)cdnam :ocean_component mynummod : 1 ==> work_spoc_communication_4_4/atmos.out_100 <== ----------------------------------------------------------- I am atmos process with rank : 0 in my local communicator gathering 4 processes ---------------------------------------------------------- Local partition definition il_extentx, il_extenty, il_size, il_offsetx, il_offsety, il_offset = 96 18 1728 0 0 0 ig_paral = 1 0 1728 grid_lat_atmos maximum and minimum -46.901409149169922 -90.000000000000000 var_id FRECVATM, var_id FSENDATM 1 2 End of initialisation phase ==> work_spoc_communication_4_4/ocean.out_100 <== ----------------------------------------------------------- I am ocean process with rank : 0 in my local communicator gathering 4 processes ---------------------------------------------------------- Local partition definition il_extentx, il_extenty, il_size, il_offsetx, il_offsety, il_offset = 182 37 6734 0 0 0 ig_paral = 1 0 6734 grid_lat_ocean maximum and minimum -50.059176404466399 -78.190584955855201 var_id FRECVOCN, var_id FSENDOCN 1 2 End of initialisation phase Thanks for your help
Hi, To get oasis debug files, you have to give a value > 0 to the first number on the line below the NLOGPRT keyword in your namcouple. For example, you can rerun with $NLOGPRT 30 0 in your namcouple to get the full debug information into files names debug_* . You should get more information about your abort in those files. I hope this helps, Sophie