breakcore2 commited on
Commit
6fe7f1a
·
1 Parent(s): 5c40d4d

Update d-adaptation/notes.md

Browse files
Files changed (1) hide show
  1. d-adaptation/notes.md +1 -0
d-adaptation/notes.md CHANGED
@@ -12,6 +12,7 @@ As noted in the same github issue, alpha/rank scaling modifies the gradient upda
12
  UMP redone at dim 8 alpha 8 showed recognizable character but still significantly degraded aesthetics and prompt coherence.
13
  After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
14
  Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
 
15
 
16
  ## Dim
17
  128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.
 
12
  UMP redone at dim 8 alpha 8 showed recognizable character but still significantly degraded aesthetics and prompt coherence.
13
  After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
14
  Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
15
+ dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
16
 
17
  ## Dim
18
  128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.