breakcore2
/

experimental_loras

breakcore2 commited on Feb 25, 2023

Commit

6fe7f1a

1 Parent(s): 5c40d4d

Update d-adaptation/notes.md

Files changed (1) hide show

d-adaptation/notes.md CHANGED Viewed

@@ -12,6 +12,7 @@ As noted in the same github issue, alpha/rank scaling modifies the gradient upda
 UMP redone at dim 8 alpha 8 showed recognizable character but still significantly degraded aesthetics and prompt coherence.
 After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
 Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
 ## Dim
 128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.

 UMP redone at dim 8 alpha 8 showed recognizable character but still significantly degraded aesthetics and prompt coherence.
 After redoing UMP at dim 8 alpha 8 with less cosine restarts (16->9), the results are much better.
 Consine restarts would likely affect how much time we spend at a high learning rate which could be the reason for blowing the model apart.
+dim 8 alpha 1 retrained at lower cosine restarts succeeded as well. Supposedly alpha scales the gradient down which causes the LR to up but obviously the relationship is not linear if 1/8x alpha did not cause the results to be garbage here. So the base LR is far more sensitive than the alpha choice.
 ## Dim
 128 dim shows some local noisy patterns. Reranking the model to a lower dim from 128 doesn't get rid of it. Converting the weights of the last up block in the unet does but also causes a noticable change in the generated character. Obviously you could reduce the last up block by a smaller amount.