Proficiency Training High Performance Computing: Batch Job Scheduling

Proficiency Training High Performance Computing: Batch Job Scheduling

The Proficiency Training High Performance Computing (ProTHPC) consists of short courses which offer researchers and students in academia an introduction to respective topics promoting efficient and successful work in the context of HPC.

This ProTHPC will be held entirely online as a web conference.

The courses of the program are modules that are held on different days.

Other modules are:

As the difficulty level of the modules varies greatly, we advice you to consider your participation carefully. ProTHPC is designed as a collection of topics to chose from, rather than an entire workshop to be taken at once.

The topics are aligned to the skill tree of the HPC certification forum.

All  courses consist of lectures, supplemented with practical exercises. They will be held in English.

Schedule for all modules

  • 09:00AM - 12:00AM Lectures
  • 12:00AM - 02:00PM Break / Individual time to work on exercises
  • 02:00PM - 03:00PM Exercise discussion, Q&A time

Requirements

We strongly recommend that to participate in this workshop, you have an account on one of the Hessian HPC clusters (Lichtenberg, Goethe-HLR, JustHPC, Kassel Linux Cluster, or MaRC2). The practical exercises for all modules require a machine with Linux installed, and module 2 specifically will be conducted on these clusters. We will also provide the course material via the clusters.


Registration

With your registration, we will use your contact information for organizational reasons:

  • to inform you about workshop details,  
  • about agenda changes, and
  • to contact you, in order to give us feedback about the workshop (evaluation).

The data will not be transferred to third parties. The personal data will be deleted two months after the closing of the workshop. We evaluate the data statistically to improve our service for your research. For further questions, please contact: office@hpc-hessen.de.

Note: The workshop is limited to 30 participants.

Click here to register.

Detailed description

Batch Job Scheduling
Level: basic - advanced. Adapted exercises offer opportunities for knowledge enhancement at all levels. As job scheduling makes use of shell scripting, it is recommended to attend the course introduction into Linux and shell scripting in the morning as well.
HPC Skill Tree: K4.2

The resources of HPC systems are managed by a job scheduler, therefore knowing how to appropriately use the scheduling system is critical for working on HPC systems. This course gives an introduction to the concepts of batch job scheduling and teaches techniques that facilitate submitting and managing multiple and interdependent jobs by using advanced features of job schedulers. The concepts are illustrated using the SLURM scheduler, which is used at all university clusters in Hessen.

In this course, participants will learn about:

  • Creating job scripts: Job scripts define a calculation's resource requirements, runtime environment, and what software to run. There is a plethora of parameters that can be set to exactly configure process distribution, memory and GPU allocation, output and user-side notifications. As HPC computations are almost exclusively started via job scripts, this knowledge is essential for working on a cluster.
  • Controlling and monitoring cluster jobs: By using the tools provided by Slurm, users can submit, update, cancel and monitor their currently running calculations, as well as obtain performance metrics and other metadata from completed jobs. Especially when running large computation campaigns, using these tools is invaluable for keeping an overview over the project and to gather insights for future research.
  • Modelling multi-step workflows with job arrays and dependencies: Oftentimes, a project will require running a large number of similar calculations, or setting up a string of calculations that might depend on each other's results, or both. We explore the possibilities provided by Slurm's job array and job dependency features, and how both of these may be used to enable automation of HPC tasks to avoid repetitive work.