Up to Specific issues in real coupled models
Hi, First thank you for the amazing Oasis3-mct. We are a terrestrial systems research team in IBG3, Research Center Juelich, Germany. Currently we are trying to extend our CLM-Parflow-Cosmo-Oasis3-(mct) program to MPMD, on our Bluegene/Q system JUQUEEN. Do you think the effort is more close to "simply change the MPI_COMM_WORLD in oasis_init_comp to the splitted communicator, passed by the model" or "you need to rewrite the whole Oasis3-mct"? Put aside the possible name conflicts problem, could you kindly give us some hint that if the domain decomposition / rank / communicator management in Oasis3-mct is suitable for MPMD? Many thanks! Guowei HE g.he@fz-juelich.de
Hi Guowei, OASIS3-MCT is designed to run in MPMD. When coupling models using the coupler, there must be one executable by model. In a coupled model, MPI_COMM_WORLD becomes the communicator with all the processors of the different models and indeed you have to replace the original MPI_COMM_WORLD of your models by the local communicator returned by oasis_get_localcomm (see the User Guide https://oasis.cerfacs.fr/wp-content/uploads/sites/114/2021/02/GLOBC-Valcke_TR_OASIS3-MCT_2.0_2013.pdf section 2.2.2). Best regards,Laure
Hi Laure, Thanks for the reply! Sorry I haven't stated the question correctly. We are doing data assimilation, which requires several sets of coupled models running under one MPI_COMM_WORLD. For example, in tutorial we have model1 and model2. But in our case, we need several sets of model1 and model2, indepently running simutaneously. We just had a brute force hack: changing every MPI_COMM_WORLD in oasis_init_comp subroutine to a new comm argument, which is passed by our MPI_SPLIT_COMM to the subroutine. Our initial test seems to be working: For example, we had 16 procs, rank 0-3 model1 communicates with 4-7 model2, and rank 8-11 model1 with 12-15 model2. They produce same result as a nproc_exe1=4, nproc_exe2=4 in the original run_tutorial does. But do you think this approach works for more complicated case? We also notice that oasis still calls MPI_COMM_WORLD in the inter_comm subroutine. More over, we are concerned about if MCT also directly manipulate the MPI_COMM_WORLD directly. Many thanks! /Guowei
Hi Guowei, There is a simpler way to define a new communicator for each sub-group of processes of each model. In model1 and model2, you must define a different comp_name for each sub-group of processes and then call oasis_init_comp with each comp_name. Let me know if it is better. Best regards, Laure