Multi-architecture parallelism using Kokkos/C++ library
Cerfacs is Qualiopi certified for its training activities
Duration : 2.5 days / 17 hours
Face to face training session
About 15 years ago, Nvidia introduces the CUDA architecture, i.e. the use of graphics of processing units for doing general purpose high performance applicative tasks. Today, it is still a very challenging task to write new applications or simply refactor existing ones with portability and performance in mind, that is for being able to use efficiently available hardware HPC ressources, either multicore CPU (x86, ARM64, …) or GPU (Nvidia /AMD / Intel).
This training aims at providing a dedicated introduction to the open source Kokkos C++ library, which is mainly developped in US (Sandia and OakRidge labs) for about a decade, and funded by the Departement of Energy, under the Exascale Computing Project (ECP). We will present a theoretical and practical introduction to the Kokkos programming model, illustrating the advantages over other alternative shared memory programming model like OpenMP/OpenACC.
The participants of the training are expected to be able to integrate Kokkos library in their own HPC projects.
Objective of the training
Provide a theoretical and practical introduction to Kokkos programming model from abstract concepts, i.e. parallel programming patterns (parallel for, reduce and scan loops), data containers to an overview of the ecosystem (profiling tools, linear algebra, python bindings, …).
On completion of this course, you will be able to :
- build, install and integrate Kokkos into an existing application or write a kokkos application from scratch
- write Kokkos computing kernels, manage memory data container in heterogeneous platform (CPU/GPU).
- Able to access performance by using profiling tools
- Refactor existing code for performance portability
The training is an alternation of theoretical presentations and practical work. A multiple choice question allows the final evaluation. The training room is equipped with computers, the work can be done in sub-groups of two people.
Referent teacher: Pierre KESTENER
This course is for anyone wishing to learn about writing HPC parallel code for running across a variety of hardware architecture, and achieving performance portability using a modern C++ library (Kokkos).
- Be an employee of a European company; a certificate from the employer is required
- Have at least 5 years of high education or Master 2 trainee
- Knowledge of C++ language
- Some knowledge of parallel programing: multithreading and/or OpenMP basics
- Interest for HPC application development
- The training can take place in French or English depending on the audience, level B2 of the CEFR is required.
In order to verify that the prerequisites are satisfied, the following questionnaires must be completed. You need to get at least 75% of correct answers in order to be authorized to follow this training session. If you don’t succeed it, your subscription will not be validated. You only have two chances to complete it.
Questionnaire : https://forms.gle/HZwCM9Jbt2bM8LT
Deadline for registration: 15 days before the starting date of each training
Before signing up, you may wish to report us any particular constraints (schedules, health, unavailability…) at the following e-mail address : firstname.lastname@example.org
This training course, financed as part of the European EuroCC2 project, is free of charge and reserved for employees of European Union member companies. It normally costs 1360 € excluding VAT.
However, your registration is subject to the payment of a deposit of 200 €. This sum will be returned to you at the end of the course if your participation has been effective. If not, it will be retained as compensation for the prejudice caused by leaving people unnecessarily on the waiting list.
December, Monday 11, from 2pm to 5pm
- Refresher on hardware architecture basics (CPU / GPU), on performance measurements (memory bandwidth, FLOPs), all that is needed to understand the difficulty of writing portable and performant code. Practical exercise.
- General introduction to Kokkos c++ library, its origin, overview of concepts and software abstractions: abstract machine model (hCst/device), the kokkos parallel programing model
- Examples of production codes using c++/kokkos
- Pratical exercise: how to build and install Kokkos C++ library, for using CPU OpenMP backend and GPU/CUDA backend; how to chose compiler, how to integrate Kokkos in a Cmake based project; how to write a modulefile to simplify the use of the library.
December, Tuesday 12, from 2pm to 5pm
- Writing C++/Kokkos computing kernels: parallel programing patterns: for loops, reduce loops, scan loops.
- Overview of execution space and memory space concepts : why do we need them, how to use them. Concepts of execution policy
- Using hardware-aware data containers : multidimensional arrays (Kokkos::View) and hash maps (Kokkos::UnorderedMap)
- profiling tools (Kokkos::Tools)
December, Tuesday 13, from 2pm to 5pm
- Advanced use of Kokkos/C++: hierarchical parallelism (about using teams of threads). Examples of use.
- Coupling MPI and Kokkos for distributed applications. Introduction the Kokkos/RemoteSpace (experimental feature, accessing remote nodes memory)
- Overview of Kokkos ecosystem: linear algebra (KokkosKernels), pykokkos (python bindings)
- Kokkos for Fortran users: Kokkos/FLCL (Fortran Language Compatibility Layer)
A final exam will be conducted during the training.