This course provides an introduction into successfully running R code on HPC clusters, with a focus on learning about language properties that affect performance, as well as providing an overview over different approaches to the parallelization of R code.
In the first part of this course, we establish good practices to avoid some very common performance bottlenecks and will also provide hints on how to run R code in the context of a batch scheduling system.
The second part aims to provide an overview over several commonly-used R parallelization packages, as well as their applicability depending on the problem at hand, and aims to provide minimalist working examples for later reference.
- A basic understanding of programming concepts (variables, loops, if-then-else constructs) is necessary.
- Basic knowledge of working with the Linux commandline is recommended.
- This course assumes the attendants to be already familiar with R, i.e. the focus is not on how to write R programs in general, but on how to use specific features of the R language.
- Sessions will be held online via Zoom. Contact data will be made available after successful registration.
- The hands-on sections will be carried out on an HPC cluster that runs the SLURM batch scheduler. While parts of the exercises can also be run locally, having access to an HPC cluster with SLURM and a working MPI library is necessary for following along with the exercise parts about job scheduling and the RMPI package.