HiPerCH 12 - Module 1

Deep Learning on HPC Systems

Content

This course covers how Deep Learning can be used most efficiently on HPC systems. Modern clusters are constructed for distributed and parallel computations. We will introduce the basics of parallelizing computations with deep neural networks. Distributing the computational workload allows scaling of Deep Learning models.

We will discuss different scaling approaches that are common when working with large Deep Learning models or big datasets. Participants will scale their own neural network (using the Keras API of Tensorflow and the distributed deep learning framework Hovorod) over multiple nodes on the Lichtenberg cluster. We will use an introductory example of image recognition in order to demonstrate the capability of an HPC cluster to significantly speed up neural networks.

Agenda

09:00 - 12:00 Morning Session
12:00 - 13:00 Lunch Break
13:00 - 17:00 Afternoon Session

Trainer(s)

Dr. Marcel Giar (HKHLR)
Tim Jammer (HKHLR)

Participation

Basic knowlege of using the Keras API of Tensorflow to train basic models is a prerequisite for this course!
Sessions will be held online via Zoom. Contact data will be made available after successful registration.
The hands-on session will be carried out through an SSH connection on the Lichtenberg cluster at Technical University Darmstadt. Participants shall use their own computer with either Linux/MacOS or Windows with MobaXterm installed. Participants without account to Lichtenberg will be provided with a guest account for the course.
The maximum number or participants is 20.

HKHLR - HPC Hessen