Up to Starting with OASIS3-MCT (first steps, tutorial, ...)
Hi, I'm new to oasis3-mct. I have compiled oasis3_mct version 5.2 with ifort and mpich. Now I am trying to run the tutorial coupled model(tutorial_oa), but I didn't get the "debug" file. I checked the namecouple file, and the $NLOGPRT is 30 3, i.e. $NLOGPRT # Amount of information written to OASIS3-MCT log files (see User Guide) 30 3
Hi, I assume that, your atmos has 1 rank, ocean has 2 rank but you do not used Rank 0 for OASIS. (maybe due to $NBMODEL). then you need to run mpirun -np 3 ? The OASIS3-MCT debug files debug.01.00000? and debug.02.00000? (where ? goes from 0 to 3, one for each process) are written by each process of ocean and atmos respectively. https://gitlab.com/cerfacs/oasis3-mct/-/blob/OASIS3-MCT_5.0/examples/tutorial_communication/tutorial_communication.pdf :) Best,
Hi, Thanks for your replying. But I'm not sure if I understand it. I fellow "https://gitlab.com/cerfacs/oasis3-mct/-/blob/OASIS3-MCT_5.0/examples/tutorial_communication/tutorial_communication.pdf" step by step. But I didn't get the debug files. I only got the ocean.out.100 ocean.out.101 ocean.out.102 ocean.out.103 and atmos.out.100 atmos.out.101 atmos.out.102 atmos.out.103. Here is the "run_tutorial_oa" file. I'm using "yhrun", maybe it's bad? # ############### User's section ####################################### # ## - Define architecture and coupler arch=intel # training, belenos, nemo_lenovo, mac # kraken, gfortran_openmpi_openmp_linux # pgi_openmpi_openmp_linux, # pgi20.4_openmpi_openmp_linux (not work with 4.0) # gnu1020_openmpi_openmp_linux (not work with 4.0) # # - Define number of processes to run each executable nproc_exe1=4 nproc_exe2=4 # ############### End of user's section ################################ # # - Define rundir rundir=${srcdir}/work_${casename}_${nproc_exe1}_${nproc_exe2}_oa # echo '*****************************************************************' echo '*** '$casename' : '$run echo '' echo 'Rundir :' $rundir echo 'Architecture :' $arch echo 'Host : '$host echo 'User : '$user echo '' echo $exe1' runs on '$nproc_exe1 'processes' echo $exe2' runs on '$nproc_exe2 'processes' echo '' ###################################################################### ### 1. Create rundir and copy everything needed # \rm -fr $rundir mkdir -p $rundir cp -f $datadir/*nc $rundir/. cp -f $srcdir/$exe1 $rundir/. cp -f $srcdir/$exe2 $rundir/. cp -f $datadir/namcouple_LAG $rundir/namcouple cd $rundir ###################################################################### ### 2. Definition of mpirun command and batch script # if [ $arch == training ]; then MPIRUN=/usr/local/intel/impi/2018.1.163/bin64/mpirun elif [ $arch == gfortran_openmpi_openmp_linux ]; then MPIRUN=/usr/lib64/openmpi/bin/mpirun elif [ $arch == pgi_openmpi_openmp_linux ]; then MPIRUN=/usr/local/pgi/linux86-64/18.7/mpi/openmpi-2.1.2/bin/mpirun elif [ $arch == gnu1020_openmpi_openmp_linux ]; then MPIRUN=/usr/local/openmpi/4.1.0_gcc1020/bin/mpirun elif [ $arch == pgi20.4_openmpi_openmp_linux ]; then MPIRUN=/usr/local/pgi/linux86-64/20.4/mpi/openmpi-3.1.3/bin/mpirun elif [ $arch == intel ]; then MPIRUN=/usr/bin/yhrun elif [ $arch == belenos ] ; then (( nproc = $nproc_exe1 + $nproc_exe2 )) cat <$rundir/run_$casename.$arch #!/bin/bash #SBATCH --exclusive #SBATCH --partition=normal256 #SBATCH --time=00:10:00 #SBATCH --job-name=spoc # job name #SBATCH -N 1 # number of nodes #SBATCH -n $nproc # number of procs #SBATCH -o $rundir/$casename.o #SBATCH -e $rundir/$casename.e ulimit -s unlimited cd $rundir module load intelmpi/2018.5.274 module load intel/2018.5.274 module load netcdf-fortran/4.5.2_V2 # export KMP_STACKSIZE=1GB export I_MPI_WAIT_MODE=enable # time mpirun -np $nproc_exe1 ./$exe1 : -np $nproc_exe2 ./$exe2 # EOF # elif [ ${arch} == nemo_lenovo ] ; then MPIRUN=mpirun (( nproc = $nproc_exe1 + $nproc_exe2 )) cat < $rundir/run_$casename.$arch #!/bin/bash -l # Nom du job #SBATCH --job-name spoc # Temps limite du job #SBATCH --time=00:10:00 #SBATCH --partition debug #SBATCH --output=$rundir/$casename.o #SBATCH --error=$rundir/$casename.e # Nombre de noeuds et de processus #SBATCH --nodes=1 --ntasks-per-node=$nproc #SBATCH --distribution cyclic cd $rundir ulimit -s unlimited #SPOC module purge #SPOC module -s load compiler/intel/2015.2.164 mkl/2015.2.164 mpi/intelmpi/5.0.3.048 # time $MPIRUN -np $nproc_exe1 ./$exe1 : -np $nproc_exe2 ./$exe2 # EOF elif [ ${arch} == kraken ] ; then (( nproc = $nproc_exe1 + $nproc_exe2 )) cat < $rundir/run_$casename.$arch #!/bin/bash -l #SBATCH --partition debug # Nom du job #SBATCH --job-name spoc # Temps limite du job #SBATCH --time=00:10:00 #SBATCH --output=$rundir/$casename.o #SBATCH --error=$rundir/$casename.e # Nombre de noeuds et de processus #SBATCH --nodes=1 --ntasks-per-node=$nproc #SBATCH --distribution cyclic cd $rundir ulimit -s unlimited module purge module load compiler/intel/23.2.1 module load mpi/intelmpi/2021.10.0 module load lib/netcdf-fortran/4.4.4_phdf5_1.10.4 time mpirun -np $nproc_exe1 ./$exe1 : -np $nproc_exe2 ./$exe2 EOF fi ###################################################################### ### 3. Model execution or batch submission # if [ $arch == training ] || [ $arch == gfortran_openmpi_openmp_linux ] || [ $arch == gnu1020_openmpi_openmp_linux ] || [ $arch == pgi_openmpi_openmp_linux ] || [ $arch == pgi20.4_openmpi_openmp_linux ]; then export OMP_NUM_THREADS=1 echo 'Executing the model using '$MPIRUN $MPIRUN -oversubscribe -np $nproc_exe1 ./$exe1 : -np $nproc_exe2 ./$exe2 elif [ $arch == belenos ]; then echo 'Submitting the job to queue using sbatch' sbatch $rundir/run_$casename.$arch squeue -u $user elif [ ${arch} == nemo_lenovo ] || [ ${arch} == kraken ]; then echo 'Submitting the job to queue using sbatch' sbatch $rundir/run_$casename.$arch squeue -u $user elif [ ${arch} == mac ]; then echo 'Executing the model using mpirun' ulimit -s unlimited mpirun --oversubscribe -np $nproc_exe1 ./$exe1 : -np $nproc_exe2 ./$exe2 elif [ ${arch} == intel ]; then echo 'Executing the model using yhrun' yhrun -N 1 -n $nproc_exe1 -p deimos ./$exe1 : -N 1 -n $nproc_exe2 -p deimos ./$exe2 --partition debug fi echo $casename 'is executed or submitted to queue.' echo 'Results are found in rundir : '$rundir # ######################################################################
Hi, What are the last lines of your ocean.out.100 ocean.out.101 ocean.out.102 ocean.out.103 and atmos.out.100 atmos.out.101 atmos.out.102 atmos.out.103 files? I suspect our run has not finished properly and the debug files are not created yet. Regards, Sophie
Hi, I don't think it's related to yhrun etc. as you mention here. Can you just add this line in your script. export OASIS_DEBUG=3 export CPL_LOG=cplout_your_debug Please check you get cplout_your_debug.01.00000 or others or not. Best,
Dear Sophie, Here is the ocean.out.103 file: I am ocean process with rank : 3 in my local communicator gathering 8 processes ---------------------------------------------------------- Local partition definition il_extentx, il_extenty, il_size, il_offsetx, il_offsety, il_offset = 182 18 3276 0 54 9828 End of initialisation phase Timestep, field min and max value 0 1.00010726631575 2.85860918572124 3600 2.00021453263151 5.71721837144248 7200 3.00032179894726 8.57582755716372 10800 4.00042906526301 11.4344367428850 End of the program And atmos.out.104 I am atmos process with rank : 4 in my local communicator gathering 8 processes ---------------------------------------------------------- Local partition definition il_extentx, il_extenty, il_size, il_offsetx, il_offsety, il_offset = 96 9 864 0 36 3456 End of initialisation phase Timestep, field min and max value 0 1.00067973297724 2.84700434729930 1800 2.00135946595447 5.69400869459860 3600 3.00203919893170 8.54101304189791 5400 4.00271893190894 11.3880173891972 7200 5.00339866488618 14.2350217364965 9000 6.00407839786341 17.0820260837958 10800 7.00475813084064 19.9290304310951 12600 8.00543786381788 22.7760347783944 End of the program Best,
Hi, I'm not sure where to add "export OASIS_DEBUG=3 export CPL_LOG=cplout_your_debug". But I tried to add them to the .bashrc file and run_tutorial_oa file. Could you please let me know which file to add? Best,
Hi, you added it in ~/.bashrc and source ~/.bashrc it? you don't get cplout_your_debug.01.00000 right? I will check it tomorrow with mpiifort and mpiifx. show your namcouple. and export KMP_WARNINGS=TRUE export KMP_VERBOSE=1 export KMP_AFFINITY=verbose export OMP_NUM_THREADS=1 Best,
Hi, I added it in ~/.bashrc and source ~/.bashrc it and don't get cplout_your_debug.01.00000. Here is my namecouple file. # This is a typical input file for OASIS3-MCT. # # Any line beginning with # is ignored. # ######################################################################### $NFIELDS # The number of fields described in the second part of the namcouple. 2 ########################################################################### $RUNTIME # The total simulated time for this run in seconds 14400 ########################################################################### $NLOGPRT # Amount of information written to OASIS3-MCT log files (see User Guide) 30 3 1 ########################################################################### $STRINGS # # Everything below has to do with the fields being exchanged. # ###################################################### # # Field 1: ocean to atmos # # First line: # 1) and 2) Symbolic names for the field in the source and target component models # 3) Not used anymore but still required for parsing # 4) Exchange frequency for the field in seconds # 5) Number of transformation to be performed by OASIS3-MCT # 6) Coupling restart file names # 7) Field status: EXPORTED, EXPOUT, INPUT, OUTPUT FIELD_SEND_OCN FIELD_RECV_ATM 1 3600 1 fdocn.nc EXPOUT # # Second line: # 1)-2) and 3)-4) Source and target grid first and 2nd dimensions (optional) # 5) and 6) Source and target grid prefix (4 characters) # 7) LAG index if needed 182 149 96 72 torc lmdz LAG=+3600 # # Third line: # Overlap (P or R) and nbr of overlap grid points for source and target grids. P 2 P 0 # # List of analyses (here only MAPPING) MAPPING # # Specific parameters for each analysis (here only the name of the remapping file for MAPPING) my_remapping_file_bilinear.nc # ###################################################### # # Field 2: atmos to ocean # FIELD_SEND_ATM FIELD_RECV_OCN 1 7200 1 fdatm.nc EXPOUT # 96 72 182 149 lmdz torc LAG=+1800 # P 0 P 2 # # List of analyses (here only SCRIPR) SCRIPR # # Specific parameters for SCRIPR, here specifying the parameter of the BLINEAR interpolation to be used BILINEAR LR SCALAR LATLON 1 #
Hi, you have not fixed it yet, DEBUG for Sophie. What compiler flags is she using? Best, Subhadeep