Up to Specific issues in real coupled models
Hi, While running a coupled model involving NEMO (vn3.6 in this instance) the NEMO code calls OASIS_ABORT, for perfectly legitimate reasons. However, the MPI_ABORT call within OASIS_ABORT always returns an error code of zero to the calling environment, thus, (from mod_oasis_sys.F90): CALL MPI_ABORT (mpi_comm_global, 0, ierror). It's important that in situations like this our environment is able to recognise that the coupled model has not completed successfully, in order to avoid attempting further subsequent operations. Thus we would like MPI_ABORT to return a non-zero value (something like what is done by OASIS_MPI_ABORT in mod_oasis_sys.F90, in fact). So we can achieve this by locally modifying the MPI_ABORT call in mod_oasis_sys.F90, thus; CALL MPI_ABORT (mpi_comm_global, 1, ierror) However we don't really want to have to maintain local modifications to OASIS3_MCT, so is there any reason why it would be a bad idea to change the trunk code to return a non-zero code from MPI_ABORT, or maybe even for OASIS_ABORT to call OASIS_MPI_ABORT which does return a non-zero code as above (and allows an appropriate code to be passed in an argument via the variable "rcode")? Thanks for any thoughts, Richard
Hi Richard, I will open a ticket and discuss of that issue with Tony. Best regards, Laure
Thanks a lot. Richard
Hi Richard, The modifications are now in the rev 1532 of OASIS3-MCT_3.0. There is now an optional argument, rcode, for OASIS_ABORT(id_compid, cd_routine, cd_message, rcode), that is used by MPI_ABORT(mpi_comm_global, errcode, ierror). If rcode is present then errcode=rcode, else errcode=0. We also modified OASIS_MPI_ABORT(string,rcode), which calls OASIS_ABORT. If rcode is present then it is passed to OASIS_ABORT and then to MPI_ABORT(mpi_comm_global, errcode, ierror) with errcode=rcode, else the behaviour is as before and errcode=0 in MPI_ABORT(mpi_comm_global, errcode, ierror). Let me know if it si ok for you, Best regards, Laure
Hi Laure, Thanks for this. Ideally I would prefer for the default behaviour to return a non zero error code. That's because the component model codes (well actually mainly NEMO in particular in our case) will need to be modified in order to take advantage of the non-zero code functionality. So in an ideal world I would prefer not to have to modify the components. Maybe there are good reasons for not doing that, though? As a software developer I can see the argument for not wanting to change existing default behaviour - it might cause for other people, on the other hand I do think an abort condition should routinely return a non zero code. I'm sure we can manage somehow either way. Thanks, Richard
Hi Richard, Would it be ok if we put errcode=1 in MPI_ABORT(mpi_comm_global, errcode, ierror) by default ? Thanks, Best regards, Laure
Hi Laure, Yes, I think that would be perfect. Thanks,Richard