start

We should code less…

Sustainable Scientific Computing focuses on maintaining a balance between the actual usage of an HPC scientific software and the resources allocated to the project.

Understanding the hidden costs

At Cerfacs, we have recognized the significance of indirect costs associated with HPC computation. Beyond the computation hours and development efforts, there are various recurring burdens that impact calculation codes. Some of these include:

  • Provision (installations, porting, rights, versions)
  • Support (user questions, crashes, analysis)
  • Validation (accuracy, performance, physical fidelity)
  • End-user training
  • Training for community experts
  • Transition to new technology
  • Reintegration of previously done work

These aspects contribute to what we call the “technological debt” of HPC, which has become a core competency at Cerfacs within our Sustainable Programming initiative. This activity is now part of the advanced HPC support offered by the ALGO-COOP team, alongside algorithmics, machine learning, and exascale computing.

The disciplines connected to Sustainable Scientific Computing

Code Development best practices

While there are numerous resources available to learn computer science, HPC scientific software development requires a combination of expertise besides computer science: physical modelings, applied mathematics, numerics, massively parallel computing. As a consequence, many developers in this domain view the computer science part as a necessary evil, learned through imitation.

The key is to identify the skills that are most relevant to this audience. Indeed, best practices in code development are highly specific to each community. It begins with discussions and sharing of current practices and programming culture among contributors. Only then can new practices be introduced gradually, with a focus on ensuring the adoption and acceptance of these practices.

There are however common traits in the best practices:

  • Do not code alone, discuss the code purpose and readability with your peers.
  • Keep things as simple as possible by sticking to the critical, immediate needs. Read the spirit from this rude website designer
  • Search actively before coding: “If it is a good idea, someone probably did it already and better. If nobody already did it, it is probably a bad idea.”

HPC technology watch

Experts often remark that high-performance technologies evolve faster than people can keep up with. For example, GPU-powered supercomputers have gained significant prominence since 2010, as seen in the increasing presence of such systems in the TOP500 supercomputers list.

gpurise

Systems Using GPU Accelerators on the TOP500, from “Reinventing High Performance Computing: Challenges and Opportunities”, Reed & Cannon & Dongarra, arXiv:2203.02544

However, many HPC software contributors still do not consider GPUs in their daily work. The remaining minority engage in HPC technology watch, which serves two distinct purposes:

  1. Legacy softwares. A fully operational operational HPC code takes around a decade to reach full maturity. Technology watchers search for le most efficient/less intrusive way of porting these software on new architectures.

  2. Brand new softwares. Supposing we are starting from scratch, Technology watchers must select among the many outstanding approaches developed which ones will be long-lasting.

Only an handful of brand new softwares will become legacy. On the other hand, new approaches cannot be tested large legacy software right away, it is far to risky. This is why tech watchers are always working on both sides.

Lean technology transfer

Raising the maturity of HPC softwares so it reaches its intended users is unfortunately the source of many indirect costs. The maturity level is gauged using a TRL scale, where each new level imply more people and missions. The most unvoidable mission is user support, and illustrates well the challenge.

Support is a good activity a priori for research: Discussions with users can be driven towards consultancy, nurturing long-term collaborations. Concerning bugs and crashes, well they should be fixed sooner or later for the greater good. The downside is the need of competent and easily available manpower. Its volume is hardly predictable and interupts the work.

In the end, technology transfer is a matter of constantly matching the ressources allocated with the usage of the code.

Codemetrics

Codemetrics is a relatively new field in HPC. Building on the work of A. Thornhill, the objective is to provide accurate information to HPC development teams. This includes details about the code itself, such as size, complexity, dependencies, and adherence to specific writing standards. Codemetrics also focuses on team habits, such as commit sizes and frequency, author evolution, and growth rate. Finally it can produce useful maps for training and planning.

By monitoring and analyzing this information, codemetrics contributes to Sustainable Scientific Computing initiatives. However, it is important to ensure that implementing codemetrics does not consume excessive manpower.

Takeaway

This short text described the different paths to mitigate the technological debt of HPC scientific software. Hidden costs can be cut with:

  • Code Development best practices
  • HPC technology watch
  • Lean technology transfer
  • Codemetrics

Like this post? Share on: TwitterFacebookEmail


Antoine Dauptain is a research scientist focused on computer science and engineering topics for HPC.

Keep Reading


Published

Category

Pitch

Tags

Stay in Touch