pipeline-parallelism-with-controllable-memory

Sleeping

Nyamdavaa Amar commited on Jun 19, 2024

Commit

2a9343e

1 Parent(s): ec79aec

Add approximate measure of memories.

Files changed (3) hide show

README.md CHANGED Viewed

@@ -13,6 +13,4 @@ license: apache-2.0
 # Pipeline Parallellism with Controllable Memory
-Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
-Bubble Rate here is calculated as (1 - longest stage time/(F+B+W)/m).


13
14	# Pipeline Parallellism with Controllable Memory
15
16	+ Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).

adaptive_schedule.py CHANGED Viewed

@@ -376,17 +376,6 @@ def squeeze_without_change_order(schedules, m):
                 identifier_cnt[i][identifier] += 1
                 identifier_index[_cnt * p + i][identifier] = index
                 stage_index[i] = index + 1
-    while True:
-        if(len(squeezed[0]) == 1):
-            break
-        allempty = True
-        for x in squeezed:
-            if x[-1] != ' ':
-                allempty = False
-        if allempty == False:
-            break
-        for x in squeezed:
-            del x[-1]
     return squeezed

                 identifier_cnt[i][identifier] += 1
                 identifier_index[_cnt * p + i][identifier] = index
                 stage_index[i] = index + 1
     return squeezed

description1.md CHANGED Viewed

@@ -6,4 +6,9 @@ From our findings, we need approximately 1/3 memory under ideal conditions (F, B
 Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
 Bubble Rate here is calculated as (1 - longest stage time/(F+B+W)/m).

 Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
+| Comparison assuming T_F=T_B=T_W                       | 1F1B    | V-Min  | V-Half     | V-ZB |
+| ----------------------------------------------------- | ------- |------- | ---------- | ---- |
+| Bubble Rate                                           | (p-1)/m | ????   | ???        |   0  |
+| Activation Memory <br> (Compared to 1F1B)             |    p   |  (p+4)//3 | (p+2)//2 |   p  |
 Bubble Rate here is calculated as (1 - longest stage time/(F+B+W)/m).