|
--- |
|
tags: |
|
- image-to-text |
|
- image-captioning |
|
license: apache-2.0 |
|
widget: |
|
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg |
|
example_title: Savanna |
|
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg |
|
example_title: Football Match |
|
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg |
|
example_title: Airport |
|
base_model: |
|
- distilbert/distilgpt2 |
|
--- |
|
|
|
Variation of https://huggingface.co/tarekziade/distilvit |
|
|
|
Trained on 270k images from Flickr10k and COCO. |
|
Training source code: https://github.com/tarekziade/distilvit |
|
|
|
|
|
Results: |
|
|
|
- eval_loss: 0.2305169701576233 |
|
- eval_rouge1: 39.511 |
|
- eval_rouge2: 14.7798 |
|
- eval_rougeL: 35.9476 |
|
- eval_rougeLsum: 35.9497 |
|
- eval_gen_len: 11.695219762592236 |
|
|
|
|
|
|