Deep Learning on HPC Systems

Content

Deep learning models can quickly scale beyond the computational capabilities of desktop computers and workstations. Distributing the computational workload is inevitable for medium- to large-sized models. HPC clusters offer a large amount of computational resources with high availability, and thus a well-suited for parallel execution of deep learning applications.

In this course, participants will learn how to use an HPC cluster to efficiently solve deep learning tasks. We will cover the basics of parallelizing computations with deep neural networks.

Common scaling approaches for working with large deep learning models and big datasets will also be discussed. We will show how to use multiple cores on a single compute node to train deep neural networks.

Participants will scale their own deep learning model over multiple compute nodes with TensorFlow, the Keras API and the distributed deep learning library Horovod.

Basic Python knowledge
Basic understanding of deep learning concepts
Familiarity with the Keras API and TensorFlow is advantageous.
The course will be conducted on the Lichtenberg II high performance cluster. We will provide a guest account if necessary to the participants. However, guest accounts are limited to the first 20 registrations.

HKHLR - HPC Hessen