Cerfacs Enter the world of high performance ...

🎓Markus HOLZER thesis defense

  Friday 29 November 2024From 14h30 at 16h30

  Phd Thesis       FAU Allemagne    

Code generation in a lattice Boltzmann framework for exascale computing

Ecole doctorale 475  MathĂ©matiques, Informatique et TĂ©lĂ©communications de Toulouse (EDMITT)

https://fau.zoom-x.de/j/97059928526?pwd=L2hmVHJoSjg5SHd4YiswOUV2OFZUdz09

Exascale supercomputers are computing systems capable of performing 1e18 FLOPS. The supercomputer named Frontier first broke this barrier of one exaFLOPS and officially initiated the era of exascale computing in 2022. The immense scale of systems like this imposes significant challenges in developing codes that can fully exploit this computing power. Furthermore, the increasingly heterogeneous hardware employed by today’s leading supercomputers adds another layer of complexity. In the field of computational fluid dynamics, it is therefore crucial to carefully consider every aspect of a numerical simulation, starting with the design and selection of algorithms suited to such environments. For example, algorithms like the lattice Boltzmann method are explicitly designed with massive parallelism in mind, making them a notable alternative to other more established methods. Nevertheless, a highly efficient implementation of this algorithm must be tailored to the respective hardware for optimal usage of resources.

To address these challenges, this thesis explores the use of code generation through an embedded domain-specific language. Code generation enables us to target specific hardware architectures and apply precise optimisations that leverage domain-specific knowledge. In this research, we extend and redesign the Python package lbmpy to support state-of-the-art variants of the lattice Boltzmann method. lbmpy represents the lattice Boltzmann method symbolically using a computer algebra system, allowing the automatic derivation of discretised equations based on user-defined specifications. To obtain equations with a minimal amount of floating-point operations we fundamentally enhance the simplification capabilities of lbmpy in this work. The discretised equations derived by lbmpy are provided to the Python package pystencils, which generates highly optimised architecture-specific compute kernels in a lower-level language from these. We expand the range of supported hardware platforms and overhaul crucial aspects of the code generation process, such as the typing system, to improve performance and maintainability.

A sophisticated integration of these compute kernels into the massively parallel multiphysics framework waLBerla is also developed, with an in-depth discussion of the key implementation components. One of the most significant advancements in this integration is the generation of highly specialised interpolation kernels. These kernels are essential for transferring information between cells of differing resolutions within the simulation domain, ensuring the accuracy and consistency of the data across varying grid sizes. This development has enabled us to perform the largest simulation run to date using the lattice Boltzmann method on a nonuniform domain, utilising more than 4000 AMD MI250X graphics processing
units. The ability to efficiently manage such a vast and heterogeneous computational environment underscores the effectiveness of our approach in scaling complex simulations on next-generation hardware platforms.

We verify and validate our approach by simulating turbulent single-phase flow around a sphere using a nonuniform mesh configuration on graphics processing units, successfully reproducing the drag crisis—a complex phenomenon that occurs at Reynolds numbers above 200 000. Additionally, we demonstrate the capabilities of our method through slug flow simulations, offering new insights into the behaviour of Taylor bubbles in complex annular pipe configurations. Finally, we analyse the trajectories of droplets under the influence of a laser heat source in three-dimensional thermocapillary flows. To evaluate the performance of our approach, we present results from all these scenarios on the latest CPU and GPGPU hardware. We provide single-node performance data and offer valuable insights by contextualising the measured results with appropriate performance models. Lastly, we examine the scalability of our developments by presenting both weak and strong scaling results on several of the world’s leading supercomputers.

JURY

Jonas LATTUniversité de GenèveReferee
Christian HOLM  UniversitĂ© de Stuttgart  Referee
Ulrich RĂśDE  FAU / CERFACS  Referee / Thesis supervisor
Jens HARTING    FZ-JĂĽlich   Examiner
Tobias GĂśNTER FAU                                     President of the Jury
Eric CLIMENT IMFT                                   Examiner
Gabriel STAFFELBACHONERA                              Co-supervisor
Catherine LAMBERT CERFACS                          Thesis supervisor

CALENDAR

Monday

02

December

2024

Artificial intelligence for computational physics

From Monday 2 December 2024 to Friday 6 December 2024

  Training    

Friday

06

December

2024

CERFACS SEMINAR

Friday 6 December 2024From 10h00 at 11h00

  JCA room, CERFACS    

Monday

09

December

2024

Multi-architecture parallelism using Kokkos/C++ library

From Monday 9 December 2024 to Wednesday 11 December 2024

  Training    

ALL EVENTS