weifeng-chen commited on
Commit
901bc96
1 Parent(s): ec67cfd

add zero dataset and achieve better result

Browse files
Files changed (2) hide show
  1. README.md +4 -4
  2. pytorch_model.bin +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
 
16
  # Model Details
17
 
18
- This model is a Chinese CLIP model trained on [Noah-Wukong Dataset](https://wukong-dataset.github.io/wukong-dataset/), which contains about 100M Chinese image-text pairs. We use ViT-B-32 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model [chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was trained for 20 epochs and it takes about 10 days with 8 A100 GPUs.
19
 
20
  # Taiyi (太乙)
21
  Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
@@ -65,14 +65,14 @@ with torch.no_grad():
65
  ### Zero-Shot Classification
66
  | model | dataset | Top1 | Top5 |
67
  | ---- | ---- | ---- | ---- |
68
- | Taiyi-CLIP-Roberta-102M-Chinese | ImageNet1k-CN | 41.00% | 69.19% |
69
 
70
  ### Zero-Shot Text-to-Image Retrieval
71
 
72
  | model | dataset | Top1 | Top5 | Top10 |
73
  | ---- | ---- | ---- | ---- | ---- |
74
- | Taiyi-CLIP-Roberta-102M-Chinese | Flickr30k-CNA-test | 44.06% | 71.42% | 80.84% |
75
- | Taiyi-CLIP-Roberta-102M-Chinese | COCO-CN-test | 46.24% | 78.06% | 88.88% |
76
  | Taiyi-CLIP-Roberta-102M-Chinese | wukong50k | 48.67% | 81.77% | 90.09% |
77
 
78
 
 
15
 
16
  # Model Details
17
 
18
+ This model is a Chinese CLIP model trained on [Noah-Wukong Dataset(100M)](https://wukong-dataset.github.io/wukong-dataset/) and [Zero(23M)](https://zero.so.com/). We use ViT-B-32 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model [chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was trained for 24 epochs and it takes about 10 days with 16 A100 GPUs.
19
 
20
  # Taiyi (太乙)
21
  Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
 
65
  ### Zero-Shot Classification
66
  | model | dataset | Top1 | Top5 |
67
  | ---- | ---- | ---- | ---- |
68
+ | Taiyi-CLIP-Roberta-102M-Chinese | ImageNet1k-CN | 42.85% | 71.48% |
69
 
70
  ### Zero-Shot Text-to-Image Retrieval
71
 
72
  | model | dataset | Top1 | Top5 | Top10 |
73
  | ---- | ---- | ---- | ---- | ---- |
74
+ | Taiyi-CLIP-Roberta-102M-Chinese | Flickr30k-CNA-test | 46.32% | 74.58% | 83.44% |
75
+ | Taiyi-CLIP-Roberta-102M-Chinese | COCO-CN-test | 47.10% | 78.53% | 87.84% |
76
  | Taiyi-CLIP-Roberta-102M-Chinese | wukong50k | 48.67% | 81.77% | 90.09% |
77
 
78
 
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:53ec5505ee1ce25f970c5ce488bbd49b5727c36faa2132de0f2cf82dddbf3e37
3
  size 410713709
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d679dcce5801d600bce716e1fa3e13508812b9cb4ff0ff6101d12a96b3a4eae9
3
  size 410713709