Leonid Kuligin


Leonid Kuligin is currently a ML cloud engineer at Google Cloud Professional Services Organization. He has more than 10 years of product engineering experience in leading Russian and German tech companies such as HeadHunter Group, Yandex and others. His focus is complicated backends for b2c products based on machine learning and distributed data processing.

Distributed tensorflow: training and profiling

Level: Intermediate+

The complexity of state-of-the-art ML models is growing exponentially, so we need to scale both input data pipelines as well as the training itself. We're going to discuss how TensorFlow addresses these challenges, as well as how you can use a built-in profiling to collect dumps, visualize and analyze them.
We're going to discuss in details how Tensorflow works in a distributed mode and how you can tune a data input pipeline (base on tf.data.Dataset API). We'll also have a look how to collect and interpret dumps while profiling the model training process with Tensorflow.
Ideally, listeners should have some experience with high-level Tensorflow APIs as well as profiling software application.