Lemmings (lemmings-hpc) is an open-source Python code designed to simplify job scheduling on HPC clusters. It achieves this goal by offering the user a set of functionalities that does not require a priori knowledge on how to interact with a job sheduler. The emphasize can then be placed on the workflow management. Portability of these workflows between different machines and machine environments will be ensured through lemmings.
This post will focus on what the workflow entails, how it has been conceptualised and implemented, what makes it so flexible to use, and will hopefully convince the reader about the vaste possibilities it can offer in any type of tasks involving HPC interaction.A representation of the workflow structure is depicted in the image.
It is essentially a conditional loop with 7 methods (indicated in blue) over which the user has full control.
are Python functions in which the user has total liberty to perform actions of interest. They are defined at different positions within the workflow’s loop. Two of these methods must return a boolean (see sketch) in order to evolve through the loop choices. Not all methods have to be used.
The actual definition of the workflow and its methods will be detailed in ANOTHER POST?
A spawn job (or child process) is set up after the prior_to_job()
method and is the starting
point of the “inner loop” within the workflow.
The workflow interacts with the hpc’s job sheduler between the prepare_run()
and the
check_on_end()
methods. The interaction consists of the submission of two batch files,
a batch_job
and a batch_pjob
, which is automatically generated by lemmings.
By default the maximum allowed CPU time, a required input of lemmings,
will be the decisive parameter if no explicit control is taken by the user in the
check_on_end()
method.
The intend of lemmings is, however, that the user takes control by specyfing one (or multiple) conditions that have to be met in order to stop the simulation. The maximum allowed CPU time will still remain a decisive parameter on top of the user specified one(s).
If the user defined condition(s) in check_on_end()
has (have) not been met,
the method should return False
, in which case a new spawn job will be initiated
All methods can share information with each other in the form of a database.yml file. It is the preferred way for the user to rely on this for interacting