Nyamdavaa Amar commited on
Commit
2a9343e
1 Parent(s): ec79aec

Add approximate measure of memories.

Browse files
Files changed (3) hide show
  1. README.md +1 -3
  2. adaptive_schedule.py +0 -11
  3. description1.md +5 -0
README.md CHANGED
@@ -13,6 +13,4 @@ license: apache-2.0
13
 
14
  # Pipeline Parallellism with Controllable Memory
15
 
16
- Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
17
-
18
- Bubble Rate here is calculated as (1 - longest stage time/(F+B+W)/m).
 
13
 
14
  # Pipeline Parallellism with Controllable Memory
15
 
16
+ Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
 
 
adaptive_schedule.py CHANGED
@@ -376,17 +376,6 @@ def squeeze_without_change_order(schedules, m):
376
  identifier_cnt[i][identifier] += 1
377
  identifier_index[_cnt * p + i][identifier] = index
378
  stage_index[i] = index + 1
379
- while True:
380
- if(len(squeezed[0]) == 1):
381
- break
382
- allempty = True
383
- for x in squeezed:
384
- if x[-1] != ' ':
385
- allempty = False
386
- if allempty == False:
387
- break
388
- for x in squeezed:
389
- del x[-1]
390
  return squeezed
391
 
392
 
 
376
  identifier_cnt[i][identifier] += 1
377
  identifier_index[_cnt * p + i][identifier] = index
378
  stage_index[i] = index + 1
 
 
 
 
 
 
 
 
 
 
 
379
  return squeezed
380
 
381
 
description1.md CHANGED
@@ -6,4 +6,9 @@ From our findings, we need approximately 1/3 memory under ideal conditions (F, B
6
 
7
  Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
8
 
 
 
 
 
 
9
  Bubble Rate here is calculated as (1 - longest stage time/(F+B+W)/m).
 
6
 
7
  Check out our paper at [Arxiv](https://arxiv.org/abs/2405.15362).
8
 
9
+ | Comparison assuming T_F=T_B=T_W | 1F1B | V-Min | V-Half | V-ZB |
10
+ | ----------------------------------------------------- | ------- |------- | ---------- | ---- |
11
+ | Bubble Rate | (p-1)/m | ???? | ??? | 0 |
12
+ | Activation Memory <br> (Compared to 1F1B) | p | (p+4)//3 | (p+2)//2 | p |
13
+
14
  Bubble Rate here is calculated as (1 - longest stage time/(F+B+W)/m).