cluster Photo Taylor Vick. Running fast on the latest supercomputers is always a challenge. Many developers rely on the Fortran programming language, whose first compiler was released in 1957. Being totally oriented toward computation, with a limited set of features continuously updated, Fortran is still popular in the HPC world today. The trick is to keep up with the changes…

The 4th of May 2021, the Excellerat Center of Excellence held a Webinar with Wadud Miah on Fortran for High Performance Computing. Read more on the Excellerat event webpage This seminar attracted 115 participants. The content was covering four main topics:

  1. A short History of Fortran

  2. Standard Fortran & HPC

  3. GPU programming

  4. Fortran community

mindmap

Webinar rationale

First we need to split the usecases to place the webinar: “I want to write Fortran to run on…”

  1. one core in my laptop (serial, no parallelization or GPU
  2. all the CPU cores of my workstation
  3. all the CPU and GPU of my workstation
  4. the CPUs of an HPC cluster
  5. the CPU and GPUs of an HPC hybrid cluster (and recently, most of the power of Supercomputers are delivered by the GPUs, especially for a better energy efficiency - and due to the hype of A.I. too…)

The first level is for reference, and is obviously only limited by the algorithm. The audience of the webinar came for level 5.

For levels 2 and 3, this is the “Shared memory” turf: There you can add OpenMP to your Fortran, but also do everything within the Fortran standard, using the Fortran-Co-arrays.

For level 4: memory is not shared anymore. In addition to parallelism you have to deal with the memory placement. Data exchanged must fit in the buffer, too small you have too many messages, and too large you’re swapping. This is essentially done through MPI libraries.

For level 5: Recent computers have several levels of shared memories with different speeds on each CPU/GPU nodes. Libraries such as OpenACC allows you to “port” then “tune” your parallel code to a GPU/CPU machine with time and sweat.

The audience of the webinar want to write Fortran statements to run on the future HPC ressources. So, what is the path that will still be included in compilers and optimized (i. e. competitive with a C++ equivalent) for the next decade on the next computers?

General questions

The following questions are originally discussed on Fortran Lang Discourse thread : questions from a fortran HPC webinar

Refactoring

Question: “There is a cost at modernizing code and often it is not considered worth the investment. How can this be addressed? (we are still using some FORTRAN IV code)”

W. M.: There is a cost. Take care of the Obsoleted features. This is a good time to move to the next standard, before getting stuck. Run a profiler to to find the bottlenecks.

Beliavsky (Fortran discourse): Fortran is backwards compatible, so that code an be modernized piecemeal. A March 2020 GitHub thread discussed some available tools for modernization, also listed at the Fortran Wiki.

A. D.: Modernizing software is more about changing people than changing code. Important but never urgent, this is usually de-prioritized. All lie in the Return over investment (ROI) expressed in manpower:

  • Return in manpower because recent code allows the use of more recent tools, better documentation and training materials. This saves the future manpower.
  • Investment in manpower because there is present manpower engaged without new features expected.

My humble opinion to limit costs:

  • Identify the critical elements the team really miss today over coffee breaks : a portability issue, a lack of training material?
  • Start small with a proof of concept for one person over the summer. Discuss , then plan the next baby step.
  • Ensure the newer version is always backward compatible (easy with Fortran), to keep the users together.
  • You will be tempted to change more than the Fortran Standards, and rewrite aggressively the code. Do it with rules, like these non-regression rules.

This should make the Modernizing task a more acceptable operation, more a permanent, marginal, pet project to train newcomers than a all-hands task force.

Current Fortran softwares

Question: “Are you aware of any NEW HPC softwares the development of which started with Fortran?”

Beliavsky (Fortran discourse): Yes. One can for example search GitHub for parallel Fortran programs that have been recently updated

awvwgk (Fortran discourse): I found three that started / went open source in 2020:

  • CREST: created on GitHub in Apr. 2020, first commit in Apr. 2020 Conformer-rotamer ensemble search tool
  • SNaC: created on GitHub in May 2020, first commit in May 2020 A multi-block solver for massively parallel direct numerical simulations (DNS) of fluid flows
  • Tracmass: created on GitHub in Sep. 2020, first commit in Dec. 2019, Lagrangian particle tracking code

An other project quoted by M. Curcik is the Earth Energy Exascale Model E3SM

cshift

Question: “Are functions like cshift available in fortran90?”

A.D.: They are, see e.g. for GFortran compiler.

Beliavsky (Fortran discourse) : There are some tools for calling C++ from Fortran at the Fortran Wiki

C++ features

Question: “What about newer C++ features (constexpr, auto, modules) and the ISO_C_BINDING? Of course the Fortran binding is for C, but it is often interchangeably used and I wonder if there are plans to support CPP. Essentially at the moment using ISO_C_BINDING seems to restrict both the Fortran features and C(++) features which are usable. How do you see this being resolved?”

W. M. Interoperability is for C only, not C-Templates. C++ standard is huge. IMHO I think that fortran will/should go this way.

Ivan Pribec : Two projects can help to close the gap between C++ and Fortran: shroud and swig-fortran.

Compilers standard compliance

Question: “More comment than question, would personally track the standard compliance of the compiler used instead of the language standard”

Beliavsky (Fortran discourse) : Chivers & Sleightholme track Fortran compiler compliance to standards

C++ container

Question: “Are you aware of any efforts to build a C++ container interface to Fortran arrays using whats available in the “ISO_Fortran_binding.h” header? This would be great to enhance interoperability with C++.”

W. M. I am afraid I don’t know. Ask to Fortran Lang , they might know the answer.

Parallelization

Beliavsky (Fortran discourse) : There are links to information about co-arrays and other parallel paradigms at the articles and tutorials sections of the Fortran Wiki.

Question: “Which parallelization methods are generally preferred? OpenMP or Fortran co-arrays?”

No answers were provided. This depends on the use case (see previous section) and of course the community.

What are the advantages of co-arrays over OpenMP

W. M.: OPenMP is only within a Node. Co-array is like MPI, it is across nodes

Space shared between co-arrays and MPI

Question: “I’m a little confused about the conceptual space shared by co-arrays and MPI. Moving forward, what will the standard focus on? Can we expect “CUDA aware co-arrays”? Or should we focus on MPI to the exclusion of co-arrays?”

This question was left unanswered .

Placements in co-arrays

Question: “Hardware topologies are getting more and more layered and deep (e.g. AMD epyc processors) with increased non uniformity in memory access. How are “placements” managed with co-arrays. It’s the counterpart of MPI process pinning and OpenMP threads affinity.”

W. M.: this is compiler dependent.

Co-arrays on multiple devices

Question: “Can co-arrays be used across multiple devices? cores? nodes? GPU? what are the restrictions?”

W. M. Multi-node parallelism. I don not think co_array device memory exists

Co-arrays performances

Question: “Are you aware of any performance comparison in real codes between co-array fortran and other approaches?”

W.M.: This highly depends on compiler. Intel improved performances.

GPU programming

CUDA Fortran on PGI?

W.M.: I think CUDA Fortran is supported by the PGI compiler (TBC)

CUDA OpenMP vs CUDA MPI

Question: “If CUDA OpenMP does not share memory, is there any advantage with respect to MPI CUDA?”

This question was left unanswered.

host/device memory copies

Question: “Can you please give some more hints on how host/device memory copies can be explicitly handled with OpenMP for GPU’s so that memory residency can be exploited.”

This question was left unanswered.

Data race

Question: “What about GPU computing and solutions for data race issue ?”

W.M. I do not know the answer to that

High-level GPU programming

Question: “Are you aware of third-party libraries such as ArrayFire for high-level GPU programming? Could you share your feelings on choosing it instead of the standard options you presented (CUDA, OpenACC, or OpenMP)?”

W. M. There are some DSL that provide the GPU parallelism. You have to think at the portability of your code. They were popular but are not today. What is the long term support for these DSL is the key, and IMHO these failed to deliver. On the other hand Fortran Standard, MPI or openMP proved they were reliable.

Ivan Pribec: ArrayFire has a unified C/C++ API. It seems feasible to build a Fortran interface on top of this. In most applications you will only need a small subset of all the routines so it could be a viable option. Still it would be nice to have an fpm-compatible ArrayFire package.

Acknowledgements

The EXCELLERAT project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 823691. logo

Many thanks to the Fortran Community on Discourse for providing extended answers to these questions.

fortranlogo

Like this post? Share on: TwitterFacebookEmail


Antoine Dauptain is a research scientist focused on computer science and engineering topics for HPC.

Keep Reading


Published

Category

Pitch

Tags

Stay in Touch