Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,56 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- SPRIGHT-T2I/spright_coco
|
5 |
+
---
|
6 |
+
## A fine-tune of OpenAI / CLIP ViT-L/14 that has an unprecedented ImageNet/ObjectNet accuracy of ~0.90 (original pre-trained model / OpenAI's CLIP: ~0.85)**.
|
7 |
+
|
8 |
+
Made possible with Geometric Parametrization (GmP):
|
9 |
+
|
10 |
+
```
|
11 |
+
|
12 |
+
"Normal" CLIP MLP (multi-layer perceptron):
|
13 |
+
|
14 |
+
(mlp): Sequential(
|
15 |
+
|-(c_fc): Linear(in_features=1024, out_features=4096, bias=True)
|
16 |
+
| (gelu): QuickGELU()
|
17 |
+
|-}-(c_proj): Linear(in_features=4096, out_features=1024, bias=True)
|
18 |
+
| |
|
19 |
+
| |-- visual.transformer.resblocks.0.mlp.c_fc.weight
|
20 |
+
| |-- visual.transformer.resblocks.0.mlp.c_fc.bias
|
21 |
+
|
|
22 |
+
|---- visual.transformer.resblocks.0.mlp.c_proj.weight
|
23 |
+
|---- visual.transformer.resblocks.0.mlp.c_proj.bias
|
24 |
+
|
25 |
+
|
26 |
+
GmP CLIP MLP:
|
27 |
+
|
28 |
+
Weight decomposition into:
|
29 |
+
- radial component 'r' as norm of pre-trained weights
|
30 |
+
- angular component 'theta' as normalized direction
|
31 |
+
-> preserves weight vectors' directionality and magnitude
|
32 |
+
|
33 |
+
(mlp): Sequential(
|
34 |
+
|-(c_fc): GeometricLinear()
|
35 |
+
| (gelu): QuickGELU()
|
36 |
+
|-}-(c_proj): GeometricLinear()
|
37 |
+
| |
|
38 |
+
| |-- visual.transformer.resblocks.0.mlp.c_fc.r
|
39 |
+
| |-- visual.transformer.resblocks.0.mlp.c_fc.theta
|
40 |
+
| |-- visual.transformer.resblocks.0.mlp.c_fc.bias
|
41 |
+
|
|
42 |
+
|---- visual.transformer.resblocks.0.mlp.c_proj.r
|
43 |
+
|---- visual.transformer.resblocks.0.mlp.c_proj.theta
|
44 |
+
|---- visual.transformer.resblocks.0.mlp.c_proj.bias
|
45 |
+
|
46 |
+
(Same thing for [text] transformer.resblocks)
|
47 |
+
|
48 |
+
```
|
49 |
+
|
50 |
+
|
51 |
+
✅ The model / state_dict I am sharing was converted back to .weight after fine-tuning - alas, it can be used in the same manner as any state_dict, e.g. for use with ComfyUI as the SDXL / SD3 Text Encoder! 🤗
|
52 |
+
|
53 |
+
- ** For details on training and those numbers / the eval, please see [https://github.com/zer0int/CLIP-fine-tune](https://github.com/zer0int/CLIP-fine-tune)
|
54 |
+
- -> You can use "exp-acts-ft-finetune-OpenAI-CLIP-ViT-L-14-GmP-manipulate-neurons.py" to replicate my exact model fine-tune.
|
55 |
+
|
56 |
+
Pre-trained CLIP model by OpenAI, License: [MIT License](https://github.com/openai/CLIP/blob/main/LICENSE)
|