zer0int commited on
Commit
31d01a8
·
verified ·
1 Parent(s): 7f99ca9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -3
README.md CHANGED
@@ -1,3 +1,56 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - SPRIGHT-T2I/spright_coco
5
+ ---
6
+ ## A fine-tune of OpenAI / CLIP ViT-L/14 that has an unprecedented ImageNet/ObjectNet accuracy of ~0.90 (original pre-trained model / OpenAI's CLIP: ~0.85)**.
7
+
8
+ Made possible with Geometric Parametrization (GmP):
9
+
10
+ ```
11
+
12
+ "Normal" CLIP MLP (multi-layer perceptron):
13
+
14
+ (mlp): Sequential(
15
+ |-(c_fc): Linear(in_features=1024, out_features=4096, bias=True)
16
+ | (gelu): QuickGELU()
17
+ |-}-(c_proj): Linear(in_features=4096, out_features=1024, bias=True)
18
+ | |
19
+ | |-- visual.transformer.resblocks.0.mlp.c_fc.weight
20
+ | |-- visual.transformer.resblocks.0.mlp.c_fc.bias
21
+ |
22
+ |---- visual.transformer.resblocks.0.mlp.c_proj.weight
23
+ |---- visual.transformer.resblocks.0.mlp.c_proj.bias
24
+
25
+
26
+ GmP CLIP MLP:
27
+
28
+ Weight decomposition into:
29
+ - radial component 'r' as norm of pre-trained weights
30
+ - angular component 'theta' as normalized direction
31
+ -> preserves weight vectors' directionality and magnitude
32
+
33
+ (mlp): Sequential(
34
+ |-(c_fc): GeometricLinear()
35
+ | (gelu): QuickGELU()
36
+ |-}-(c_proj): GeometricLinear()
37
+ | |
38
+ | |-- visual.transformer.resblocks.0.mlp.c_fc.r
39
+ | |-- visual.transformer.resblocks.0.mlp.c_fc.theta
40
+ | |-- visual.transformer.resblocks.0.mlp.c_fc.bias
41
+ |
42
+ |---- visual.transformer.resblocks.0.mlp.c_proj.r
43
+ |---- visual.transformer.resblocks.0.mlp.c_proj.theta
44
+ |---- visual.transformer.resblocks.0.mlp.c_proj.bias
45
+
46
+ (Same thing for [text] transformer.resblocks)
47
+
48
+ ```
49
+
50
+
51
+ ✅ The model / state_dict I am sharing was converted back to .weight after fine-tuning - alas, it can be used in the same manner as any state_dict, e.g. for use with ComfyUI as the SDXL / SD3 Text Encoder! 🤗
52
+
53
+ - ** For details on training and those numbers / the eval, please see [https://github.com/zer0int/CLIP-fine-tune](https://github.com/zer0int/CLIP-fine-tune)
54
+ - -> You can use "exp-acts-ft-finetune-OpenAI-CLIP-ViT-L-14-GmP-manipulate-neurons.py" to replicate my exact model fine-tune.
55
+
56
+ Pre-trained CLIP model by OpenAI, License: [MIT License](https://github.com/openai/CLIP/blob/main/LICENSE)