Towards Foundation Models for Computational Fluid Dynamics: a first look with near-wall flows

Thèse de doctorat | Algorithmes parallèles | artificial intelligence, Machine Learning, Python

Start date : 1 October 2024
Mission duration : 36 months

CERFACS is a scientific research and training organization, with a focus on computational physics. It fosters synergies between applied mathematicians, domain expert physicists, and computer scientists, who collaborate to further the understanding of complex issues ranging from earth physics to engineering systems that rely on turbulent flows. For additional information regarding CERFACS see https://www.cerfacs.fr.

This work will be performed in the Algo-COOP team (https://cerfacs.fr/coop/), which provides crucial support to CERFACS’ mission by improving computational methods, developing innovative parallel algorithms and data assimilation techniques, and leveraging Artificial Intelligence (AI) for Science.

Context

Rapid developments and impressive achievements have occurred in AI in the 2010s, in various domains like Natural Language Processing (NLP), speech recognition and synthesis, computer vision and image synthesis, and more. Comparable applications in engineering and design remain minimal, however. This can be explained by several factors, e.g., the prior availability of precise and robust numerical methods, the expectation for comparable robustness and reliability, or the lack of high-quality diverse datasets.

In NLP, a paradigm shift has started this decade with the emergence of so-called “foundation models” [1]. These leverage huge-scale datasets with self-supervised learning techniques and convey generic features that can be transferred to various tasks after a potential fine-tuning step. Famously, this has led to Generative Pre-trained Transformers (GPT) and ChatGPT, arguably propelling AI to even higher fame than in the previous decade. A similar trend has been observed in vision with Vision Transformers (ViT), and on multi-modal data with recent works (CLIP, DALL-E, FLAMINGO). This trend has begun to impact scientific fields, especially Earth sciences, for weather and climate problems. The recent ClimaX paper [2] and related work [3, 4] show promising results for pre-training a foundation model for weather and climate that can be efficiently adapted for general-purpose tasks related to the Earth's atmosphere. The key to the approach is to leverage rich and heterogeneous simulations to train the foundation model, to include vision-like transformers for different subsets of atmospheric variables, and to design a global pretext task dedicated to forecasting an arbitrary set of input variables at an arbitrary time into the future.

Objectives

The goal of this PhD thesis will be to apply the general self-supervised training approach and deep architectures of [2] to a database previously aggregated at CERFACS [5, 6] of multiple Computational Fluid Dynamics (CFD) simulations. The statistical and generalization properties of these algorithms will be assessed (e.g., the inpainting capacities of the learned foundation models in the different CFD tasks). Specifically, the goal will be to produce and evaluate near-wall and inlet flows. These must be unsteady flows, with a high resemblance to realistic turbulence, and with no significant disruptions at the interface with the resolved flow.

The candidate will then seek to improve this baseline using novel deep models, focusing on the nature of CFD data and how it differs from climate data. Notably, the use of increasing spatial contexts will be explored, through different data encodings (meshless frameworks like PINNs/Neural Operators, graph networks, interpolation to voxels). According to the latest evolutions of the literature and available open-source software, integration of the temporal context using for instance autoregressive methods will also be investigated [7]. In all cases, careful assessment of the quality of the generations, and of their potential use in the context of near-wall flow substitution and inflow generation will be performed.

This work will be performed in HPC environments. First, significant data challenges arise from training from the heterogeneous dataset of CFD simulations and will need to be addressed by the candidate. Second, the resulting trained models will be evaluated directly inside one of CERFACS’ flagship HPC solvers, [AVBP](https://cerfacs.fr/en/computational-fluid-dynamics-softwares/). Building on previous work, and notably the AI-Physics solver coupling library [PhyDLL](https://phydll.readthedocs.io/), the candidate will setup validation cases for hybrid simulation, where the solver and the trained network work together to produce the time-varying solution. This strategy must scale as well as AVBP itself, on massively parallel architectures made of a mix of CPUs and GPUs.

The thesis project is expected to last 36 months, starting October 2024.

Profile

Currently in your last year of a Master’s degree, specializing in numerical physics or a related field, you have some experience with Machine Learning (ML), or a strong taste for these technologies and the desire to learn about them. Alternatively, you have a major in Computer Science (CS) and ML, and are interested in physical modeling applications. This position requires active reading of the scientific literature in the domain and fast learning. In the research lab environment, initiative, autonomy, creativity, and synthetic thinking are highly valued. Experience with CFD solvers, deep learning libraries, or a data processing language (Python, R, Matlab) is a plus.

This thesis happens in the scope of the PHLUSIM ANR Project, in co-supervision between CERFACS and La Sorbonne University.

Contacts

Please send your resume, and a short summary of your motivations and why this position appeals to you, to the following email addresses:

– Luciano Drozda (drozda@cerfacs.fr)

– Nicolas Odier (odier@cerfacs.fr)

Bibliography

[1] Bommasani, R., & Liang, P. (2021). Reflections on foundation models. _Stanford Institute for Human-Centered AI_.

[2] Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K. Gupta, and Aditya Grover. “ClimaX: A foundation model for weather and climate”. In: (2023). eprint: arXiv:2301.10343.

[3] Jaideep Pathak et al. FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators. 2022. eprint: arXiv:2202.11214.

[4] Remi Lam et al. GraphCast: Learning skillful medium-range global weather forecasting. 2022. eprint: arXiv:2212.12794.

[5] Dupuy, D., Odier, N., & Lapeyre, C. (2023). Data-driven wall modeling for turbulent separated flows. Journal of Computational Physics, 487, 112173. doi:10.1016/j.jcp.2023.112173

[6] Dupuy, D., Odier, N., Lapeyre, C., & Papadogiannis, D. (2023). Modeling the wall shear stress in large-eddy simulation using graph neural networks. Data-Centric Engineering, 4, e7. doi:10.1017/dce.2023.2

[7] Kohl, G., Chen, L. W., & Thuerey, N. (2023). Turbulent Flow Simulation using Autoregressive Conditional Diffusion Models. arXiv preprint arXiv:2309.01745.

Cerfacs Enter the world of high performance ...