TheDrummer commited on
Commit
cdaac82
·
verified ·
1 Parent(s): 7f48392

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -116,11 +116,11 @@ WIP
116
  - Take note of a few things
117
  - Top layers = Ending layers (nearer to output)
118
  - Bottom layers = Starting layers (nearer to input)
119
- - Training a non-upscaled model affects the top layers first and slowly descends to the bottom layers over time.
120
- - Training an upscaled model with a slice of layers duplicated twice does two things:
121
- - The duplicated slices EACH have their own gradient.
122
- - There's a 'ceiling value' for each of these duplicated slices.
123
- - Even when Tunguska's duplicated slices are nearly saturated, the resulting model remains coherent and even performant.
124
  - Takeaways
125
  - These slice of layers are more connected to each other than to the model's entirety.
126
  - [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?
 
116
  - Take note of a few things
117
  - Top layers = Ending layers (nearer to output)
118
  - Bottom layers = Starting layers (nearer to input)
119
+ - Training a normal, non-upscaled model affects the top layers first and slowly descends to the bottom layers over time.
120
+ - Training an upscaled model with two slices of duplicate layers does two things:
121
+ - Each slice of duplicated layers has its own gradient.
122
+ - There's a 'ceiling value' for the duplicated layers in these slices.
123
+ - Even when Tunguska's slices of duplicated layers are nearly saturated, the resulting model remains coherent and even performant.
124
  - Takeaways
125
  - These slice of layers are more connected to each other than to the model's entirety.
126
  - [Question] Does this mean that the **original layer** before the slice is the one holding that whole duplicated slice together?