The goal of this course is to give hints and tips on how to write good-performing R code and to present an overview about the different ways of parallelization in R, while avoiding common pitfalls. The presented concepts should also enable you to refactor existing R code (or porting a code from another Language) more effectively, and to run your code on an HPC cluster using a batch scheduler.
The course will be divided into two parts, each with several hands-on sessions. While the first part is focused on serial performance, the second part will take a look at the several ways of code parallelization offered by R and aims to give hints on when and how to use parallelization techniques to improve code performance.
You will learn:
- R design philosophy
- Taking advantage of R's vectorized functions
- Memory management and data partitioning
- Running R scripts on a cluster scheduling system
- Simple multicore programming with mclapply/mcmapply
- Parallel programming with foreach
- Using Rmpi
- Random number generation and parallelism
- Hands-on workshop
Target group and requirements
- A basic understanding of programming concepts is needed.
- For the hands-on part experience with Linux shell is recommended.
- Previous knowledge of R and working with schedulers is an advantage, but not a hard prerequisite.
- Participants are expected to bring their own laptop with either Linux/MacOS or Windows with MobaXterm (see downloads for instructions) installed. The hands-on session will be carried out through an SSH connection.
- A WiFi connection will be provided via Eduroam (guest accounts are available).
- This module is limited to 20 participants.
- René Sitt (HKHLR)
- Wednesday, September 25, 9:00-18:00
- TU Darmstadt, Alexanderstraße 2, Karl-Plagge-Haus S1|22, Room 403 "New York"
- Students(Bachelor/Master): €5.-
- PhD students and members of universities or public research institutes: €20.-
- All other: €200.-
The fee includes coffee breaks, lunch, and the evening event.