zer0int
/

CLIP-GmP-ViT-L-14

Zero-Shot Image Classification

Inference Endpoints

Model card Files Files and versions Community

zer0int commited on Jun 15, 2024

Commit

31d01a8

·

verified ·

1 Parent(s): 7f99ca9

Update README.md

Files changed (1) hide show

README.md +56 -3

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- SPRIGHT-T2I/spright_coco
+---
+## A fine-tune of OpenAI / CLIP ViT-L/14 that has an unprecedented ImageNet/ObjectNet accuracy of ~0.90 (original pre-trained model / OpenAI's CLIP: ~0.85)**.
+Made possible with Geometric Parametrization (GmP):
+```
+"Normal" CLIP MLP (multi-layer perceptron):
+(mlp): Sequential(
+  |-(c_fc): Linear(in_features=1024, out_features=4096, bias=True)
+  | (gelu): QuickGELU()
+|-}-(c_proj): Linear(in_features=4096, out_features=1024, bias=True)
+| |
+| |-- visual.transformer.resblocks.0.mlp.c_fc.weight
+| |-- visual.transformer.resblocks.0.mlp.c_fc.bias
+|
+|---- visual.transformer.resblocks.0.mlp.c_proj.weight
+|---- visual.transformer.resblocks.0.mlp.c_proj.bias
+GmP CLIP MLP:
+Weight decomposition into:
+- radial component 'r' as norm of pre-trained weights
+- angular component 'theta' as normalized direction
+-> preserves weight vectors' directionality and magnitude
+(mlp): Sequential(
+  |-(c_fc): GeometricLinear()
+  | (gelu): QuickGELU()
+|-}-(c_proj): GeometricLinear()
+| |
+| |-- visual.transformer.resblocks.0.mlp.c_fc.r
+| |-- visual.transformer.resblocks.0.mlp.c_fc.theta
+| |-- visual.transformer.resblocks.0.mlp.c_fc.bias
+|
+|---- visual.transformer.resblocks.0.mlp.c_proj.r
+|---- visual.transformer.resblocks.0.mlp.c_proj.theta
+|---- visual.transformer.resblocks.0.mlp.c_proj.bias
+(Same thing for [text] transformer.resblocks)
+```
+✅ The model / state_dict I am sharing was converted back to .weight after fine-tuning - alas, it can be used in the same manner as any state_dict, e.g. for use with ComfyUI as the SDXL / SD3 Text Encoder! 🤗
+- ** For details on training and those numbers / the eval, please see [https://github.com/zer0int/CLIP-fine-tune](https://github.com/zer0int/CLIP-fine-tune)
+- -> You can use "exp-acts-ft-finetune-OpenAI-CLIP-ViT-L-14-GmP-manipulate-neurons.py" to replicate my exact model fine-tune.
+Pre-trained CLIP model by OpenAI, License: [MIT License](https://github.com/openai/CLIP/blob/main/LICENSE)