Commit
·
6fe7f1a
1
Parent(s):
5c40d4d
Update d-adaptation/notes.md
Browse files- d-adaptation/notes.md +1 -0
d-adaptation/notes.md
CHANGED
@@ -12,6 +12,7 @@ As noted in the same github issue, alpha/rank scaling modifies the gradient upda
|
|
12 |
UMP redone at dim 8 alpha 8 showed recognizable character but still significantly degraded aesthetics and prompt coherence.
|
13 |
After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
|
14 |
Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
|
|
|
15 |
|
16 |
## Dim
|
17 |
128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.
|
|
|
12 |
UMP redone at dim 8 alpha 8 showed recognizable character but still significantly degraded aesthetics and prompt coherence.
|
13 |
After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
|
14 |
Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
|
15 |
+
dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
|
16 |
|
17 |
## Dim
|
18 |
128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.
|