Checkin-Checkout cost for HPC
Up to Bugs and debugs
Posted by Anonymous at September 29 2015
Hello,
Several people using OASIS coupled model on a large number of CPU are reporting that the CHECKIN/CHECKOUT operations could significantly slow down their simulation. This problem appears with OASIS3-MCT, which needs collective MPI operations to calculate mean value on the whole model domain.
This operation was much cheaper with the mono process and independent OASIS3 executable. That's why I think that a better information is needed to warn HPC users (at runtime ? within the documentation ?)
Eric Maisonnave
Posted by Anonymous at October 19 2015
Hi Eric,
I will add a note on that in the documentation. I will also send an email to the oasis users list about this problem.
Thanks, Best regards, Laure
Posted by Anonymous at April 26 2017
Hi Laure and Eric,
Because of the cost of CHECKIN/CHECKOUT we supressed these operations. But we want to add them in some run in something like a verbose mode.
1) Do you have a tool to add automatically in namcouple required lines :
add verbose mode in logprt for each field
add 2 additional operations : add CHECKIN and CHECKOUT and add 2 required lines INT=1 for CHECKIN and for CHECKOUT
If not I will create one based on sed/awk.
2) after that I can give you my numbers on additionnal cost required by CHECK operations.
Thanks for information. Marie-Alice Foujols
Posted by Anonymous at August 25 2017
Hi Marie-Alice,
I created a new branch OASIS3-MCT_2.0_branch_r1818_IPSL from your IPSL revision of OASIS3-MCT_2.0.
I put the diags of the fields under the condition IF (OASIS_Debug >=1 ) THEN so you do not need anymore the namcouple keys CHECKIN/CHECKOUT and in production mode (OASIS_Debug = 0) nothing is written. Below is an example of what we have now :
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
oasis_advance_avdiag DIAGS FOR COUPLING FIELD : FSENDANA
oasis_advance_avdiag DIAGS on non masked points for SENT fields
oasis_advance_avdiag DIAGS on all points for RECEIVED fields (FIELD=0 on masked points)
oasis_advance_avdiag at time and time+lag : 0 0
oasis_advance_avdiag Min and its location : 1.00355421213 1 1
oasis_advance_avdiag Max and its location : 2.59263376838 50 148
oasis_advance_avdiag Mean value : 0.645899536725
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
To get the branch : svn checkout -r 2007 https://oasis3mct.cerfacs.fr/svn/branches/OASIS3-MCT_2.0_branch_r1818_IPSL/oasis3-mct/ .
Best regards, Laure