infgrad
/

jasper_en_vision_language_v1

@@ -8983,6 +8983,25 @@ The core training code will be integrated into the rag-retrieval library(https:/
 This work was accomplished during my free time; please grant time a little time.
 ## Usage
 ```python
 import torch
@@ -9048,5 +9067,9 @@ if __name__ == "__main__":
     #         [0.3226, 0.3054, 0.7421, 0.5484]])
 ```
 ## License
 **This model should not be used for any commercial purpose!**

 This work was accomplished during my free time; please grant time a little time.
+Here's a short introduction to the training method:
+The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
+The training process of jasper have 4 stage:
+Stage1&2: Distill from teacher vectors. In jasper model the teacher model is nvidia/NV-Embed-v2 and dunzhang/stella_en_1.5B_v5 (Stage1 and Stage2 will freeze different parameters.)
+Stage3: MRL training, I made some modifications to MRL to enable training on unsupervised text
+Stage4: Alignment between *jasper token embeddings from image's detailed caption* and *vision embeddings from google/siglip-so400m-patch14-384*.
+I use a AdaptiveAvgPool2d to do an adjustment on vision tokens' number and dimensions, this method does not need additional parameters.
+**The meaning of distillation is to achieve better results with smaller models or as a way of pre-training, not to hit the top of the leaderboards.**
+Actually, I've got first place on MTEB (Chinese and English), I will not release the two models, as I said before, it's meaningless.
 ## Usage
 ```python
 import torch
     #         [0.3226, 0.3054, 0.7421, 0.5484]])
 ```
+## Evaluation on MTEB
+script: ./scripts/evaluate_en_mteb/run_evaluate_mteb.py
 ## License
 **This model should not be used for any commercial purpose!**