GPipe - Combining Data and So Pipeline Paralllelism

 There are 2 main ways of achieving model parallelism : 

  1. split each layer across nodes (horizontal model parallelism) - FSDP
  2. assign different layers to different nodes (vertical model parallelism) - pipeline parallelism

The main challenge with approach 2 is that the way DNN training works : 

Bubbles in pipeline => low efficiency.
 Only advantage of single node execution is that the last step of update is parallelized. 



GPipe key idea



So by adding a dose of data parallelism (micro-batches), we now get less bubbles in computation space-time diagram. 


GPipe Details












Comments

Popular posts from this blog

Pytorch DDP / FSDP Overview

Serving DNNS Like Clockwork - OSDI 2020