tarekziade commited on
Commit
622b2a4
1 Parent(s): bfcb944

New training

Browse files
.DS_Store ADDED
Binary file (6.15 kB). View file
 
README.md CHANGED
@@ -1,50 +1,81 @@
1
  ---
2
  tags:
3
- - generated_from_trainer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  model-index:
5
- - name: test-push
6
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
-
12
- # test-push
13
-
14
- This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
15
-
16
- ## Model description
17
-
18
- More information needed
19
-
20
- ## Intended uses & limitations
21
-
22
- More information needed
23
 
24
- ## Training and evaluation data
25
 
26
- More information needed
 
27
 
28
- ## Training procedure
29
 
30
- ### Training hyperparameters
 
31
 
32
- The following hyperparameters were used during training:
33
- - learning_rate: 5e-05
34
- - train_batch_size: 50
35
- - eval_batch_size: 50
36
- - seed: 42
37
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
38
- - lr_scheduler_type: linear
39
- - num_epochs: 1
40
 
41
- ### Training results
42
 
 
 
43
 
 
44
 
45
  ### Framework versions
46
 
47
- - Transformers 4.33.2
48
- - Pytorch 2.3.1
49
- - Datasets 2.20.0
50
- - Tokenizers 0.13.3
 
1
  ---
2
  tags:
3
+ - image-to-text
4
+ - image-captioning
5
+ license: apache-2.0
6
+ metrics:
7
+ - rouge
8
+ datasets:
9
+ - nlphuji/flickr30k
10
+ widget:
11
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
12
+ example_title: Savanna
13
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
14
+ example_title: Football Match
15
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
16
+ example_title: Airport
17
+ base_model:
18
+ - google/vit-base-patch16-224-in21k
19
+
20
  model-index:
21
+ - name: mozilla/distilvit
22
+ results:
23
+ - task:
24
+ type: image-to-text
25
+ name: Image To Text
26
+ dataset:
27
+ name: nlphuji/flickr30k
28
+ type: nlphuji/flickr30k
29
+ metrics:
30
+ - name: ROUGE-1
31
+ type: rouge
32
+ value: 43.006
33
+ verified: true
34
+ - name: ROUGE-2
35
+ type: rouge
36
+ value: 16.9939
37
+ verified: true
38
+ - name: ROUGE-L
39
+ type: rouge
40
+ value: 38.8923
41
+ verified: true
42
+ - name: ROUGE-LSUM
43
+ type: rouge
44
+ value: 38.8877
45
+ verified: true
46
+ - name: loss
47
+ type: loss
48
+ value: 0.19939416646957397
49
+ - name: gen_len
50
+ type: gen_len
51
+ value: 11.327256736227712
52
+ verified: true
53
  ---
54
 
55
+ # distilvit
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
+ This model is a work in progress. Fine-tuned version of those base models:
58
 
59
+ - a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
60
+ - a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
61
 
62
+ This model was trained on:
63
 
64
+ - Flickr30k : https://huggingface.co/datasets/nlphuji/flickr30k
65
+ - COCO 2017: https://cocodataset.org
66
 
67
+ You can get that checkpoint using the 3083a3cef6e3c8dd90df3f088074bbe836b0f403 commit.
 
 
 
 
 
 
 
68
 
69
+ It was then further fine-tuned on :
70
 
71
+ - Flickr30k debiased: https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions
72
+ - DocOrNot: https://huggingface.co/datasets/Mozilla/docornot
73
 
74
+ You can find the code used to create the model here: https://github.com/mozilla/distilvit
75
 
76
  ### Framework versions
77
 
78
+ - Transformers 4.40.2
79
+ - Pytorch 2.3.0+cu121
80
+ - Datasets 2.19.1
81
+ - Tokenizers 0.19.1
config.json CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "architectures": [
3
  "VisionEncoderDecoderModel"
4
  ],
@@ -2173,6 +2174,5 @@
2173
  "model_type": "vision-encoder-decoder",
2174
  "pad_token_id": 50256,
2175
  "tie_word_embeddings": false,
2176
- "torch_dtype": "float32",
2177
  "transformers_version": "4.33.2"
2178
  }
 
1
  {
2
+ "_name_or_path": "/Users/tarekziade/Dev/distilvit/distilvit/../vit-base-patch16-224-distilgpt2",
3
  "architectures": [
4
  "VisionEncoderDecoderModel"
5
  ],
 
2174
  "model_type": "vision-encoder-decoder",
2175
  "pad_token_id": 50256,
2176
  "tie_word_embeddings": false,
 
2177
  "transformers_version": "4.33.2"
2178
  }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
onnx/decoder_model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1407c3bf1f40883552e100b56dede3ae3f5028169f3b71d520390a049418ab9e
3
+ size 385864797
onnx/decoder_model_merged.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:088ae054ab9b988b348314ea9e9c87966c43c0f6699180aa80c1fe67fc3a5089
3
+ size 387342586
onnx/decoder_model_merged_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a557dd52eb6662df9fbfc7c6eb70acc6f4e3b7e05139131de99c399cc7233c0d
3
+ size 99759578
onnx/decoder_model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:534c5578ebe765a98a8ecd4526b6ba7a86054a63bd7ddb07f4df00824f9eeacb
3
+ size 98065762
onnx/decoder_with_past_model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:420f2371373ba5a0347e9fc0fe7a6b8ee66b99a37a71b1fc9aca83935639b44a
3
+ size 385864377
onnx/decoder_with_past_model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c46d9d9457d906cf1912d8a8a15d44c934558874b1ccdff5fadea7afb04cb96
3
+ size 98063169
onnx/encoder_model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:baf53cebbcc6d277bb1874d78e729e591a81334f90ad96a9b2b99a60836c8dca
3
+ size 343440632
onnx/encoder_model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e3c7495809ab8fa02188d486192b1a6a5ad3d33bc5c91dfe795e08d4603692b
3
+ size 87038172
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:92b15e097c140fe56e3cce918035e1bd564727b3e0cb0ec3ddbdebdb1f702572
3
  size 730047834
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b26ea0227ceb2870c6d45cf7000830cea7d5a0727d6996941d1564195d5a2e6
3
  size 730047834
quantize_config.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "per_channel": false,
3
+ "reduce_range": false,
4
+ "per_model_config": {
5
+ "decoder_with_past_model": {
6
+ "op_types": [
7
+ "Gemm",
8
+ "Slice",
9
+ "Transpose",
10
+ "Constant",
11
+ "Range",
12
+ "Pow",
13
+ "Unsqueeze",
14
+ "Where",
15
+ "Softmax",
16
+ "Concat",
17
+ "ReduceMean",
18
+ "Sub",
19
+ "Split",
20
+ "Shape",
21
+ "MatMul",
22
+ "ConstantOfShape",
23
+ "Div",
24
+ "Tanh",
25
+ "Reshape",
26
+ "Squeeze",
27
+ "Add",
28
+ "Gather",
29
+ "Sqrt",
30
+ "Mul",
31
+ "Cast"
32
+ ],
33
+ "weight_type": "QInt8"
34
+ },
35
+ "decoder_model": {
36
+ "op_types": [
37
+ "Gemm",
38
+ "Slice",
39
+ "Transpose",
40
+ "Constant",
41
+ "Range",
42
+ "Pow",
43
+ "Unsqueeze",
44
+ "Where",
45
+ "Softmax",
46
+ "Concat",
47
+ "ReduceMean",
48
+ "Sub",
49
+ "Split",
50
+ "Shape",
51
+ "MatMul",
52
+ "ConstantOfShape",
53
+ "Div",
54
+ "Tanh",
55
+ "Reshape",
56
+ "Squeeze",
57
+ "Add",
58
+ "Gather",
59
+ "Sqrt",
60
+ "Mul",
61
+ "Cast"
62
+ ],
63
+ "weight_type": "QInt8"
64
+ },
65
+ "encoder_model": {
66
+ "op_types": [
67
+ "Equal",
68
+ "Slice",
69
+ "Transpose",
70
+ "Constant",
71
+ "Pow",
72
+ "Unsqueeze",
73
+ "Where",
74
+ "Softmax",
75
+ "Concat",
76
+ "ReduceMean",
77
+ "Sub",
78
+ "Shape",
79
+ "MatMul",
80
+ "ConstantOfShape",
81
+ "Conv",
82
+ "Div",
83
+ "Erf",
84
+ "Reshape",
85
+ "Expand",
86
+ "Add",
87
+ "Gather",
88
+ "Sqrt",
89
+ "Mul"
90
+ ],
91
+ "weight_type": "QUInt8"
92
+ },
93
+ "decoder_model_merged": {
94
+ "op_types": [
95
+ "Gemm",
96
+ "Slice",
97
+ "Transpose",
98
+ "Constant",
99
+ "Range",
100
+ "If",
101
+ "Pow",
102
+ "Unsqueeze",
103
+ "Where",
104
+ "Softmax",
105
+ "Concat",
106
+ "ReduceMean",
107
+ "Sub",
108
+ "Split",
109
+ "Shape",
110
+ "MatMul",
111
+ "ConstantOfShape",
112
+ "Div",
113
+ "Tanh",
114
+ "Reshape",
115
+ "Squeeze",
116
+ "Add",
117
+ "Gather",
118
+ "Sqrt",
119
+ "Mul",
120
+ "Cast"
121
+ ],
122
+ "weight_type": "QInt8"
123
+ }
124
+ }
125
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "pad_token": "<|endoftext|>",
5
+ "unk_token": "<|endoftext|>"
6
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "eos_token": "<|endoftext|>",
6
+ "model_max_length": 1024,
7
+ "tokenizer_class": "GPT2Tokenizer",
8
+ "unk_token": "<|endoftext|>"
9
+ }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:75027ebf1d2059df780db4f160a3b3b8bff13fad52e4cec941ead5253f74ed00
3
  size 4728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0a2ac50f309f8c9847a82159d9a9ac78e7a2325898793a1789b7a803a96a996
3
  size 4728
vocab.json ADDED
The diff for this file is too large to render. See raw diff