outsu commited on
Commit
206ebe3
·
verified ·
1 Parent(s): 9ce6246

TeLVE v1.0dep released. Due to the addressing problem during training, it is not recommended to use it because it is trained with a dataset of about half the size.

Browse files
Files changed (2) hide show
  1. README.md +80 -79
  2. models/TeLVE_v1.0dep.pth +3 -0
README.md CHANGED
@@ -1,79 +1,80 @@
1
- ---
2
- license: cc-by-4.0
3
- language:
4
- - en
5
- - tr
6
- tags:
7
- - VLM
8
- - image2text
9
- - lm
10
- ---
11
- # TeLVE: Turkish efficient Language Vision Engine 🧿
12
- [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
13
- [![Models: v1.0](https://img.shields.io/badge/Models-v1.0-blue)](https://huggingface.co/outsu/TeLVE)
14
- ## First Turkish VLM ever!
15
-
16
- TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
17
- No module named 'imagine'
18
- ![TeLVE logo](<teLVE_logo.png>)
19
-
20
- ## Model Description
21
-
22
- TeLVE combines:
23
- - 🖼️ Vision Transformer (ViT-base-patch16-224)
24
- - 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
25
- - 🔄 Cross-attention mechanism for vision-language fusion
26
-
27
- ### Version Logs
28
- - **TeLVE v1.0**: Trained on Unsplash Lite dataset
29
-
30
- ## Usage
31
-
32
- The model can be used in two ways:
33
-
34
- ### Inference (imagine.py)
35
- ```python
36
- # Generate captions for images
37
- python imagine.py
38
- ```
39
- This script:
40
- - Loads a trained TeLVE model
41
- - Takes images from `images` directory
42
- - Generates Turkish captions for each image
43
- - Outputs the results to console
44
-
45
- ### Training (main.py)
46
- Users can train their own models with ViT and BERT encoders.
47
- ```python
48
- # Train a new model
49
- python main.py
50
- ```
51
-
52
- This script:
53
- - Loads and preprocesses image-caption pairs
54
- - Initializes ViT and BERT encoders
55
- - Trains the combined model
56
- - Saves the model and tokenizer
57
-
58
-
59
- ## Performance
60
- Performance scores will be evaluated.
61
- <!--
62
- | Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
63
- |--------------|---------|---------|---------|--------|
64
- | TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
65
- | TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
66
-
67
- ## Citation
68
-
69
- ```bibtex
70
- @software{telve2024,
71
- author = {Öğüt Su Karagün},
72
- title = {TeLVE: Turkish efficient Language Vision Engine},
73
- year = {2024},
74
- url = {https://huggingface.co/outsu/TeLVE}
75
- }
76
- ```
77
-
78
- ## License
79
- This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - en
5
+ - tr
6
+ tags:
7
+ - VLM
8
+ - image2text
9
+ - lm
10
+ ---
11
+ # TeLVE: Turkish efficient Language Vision Engine 🧿
12
+ [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
13
+ [![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
14
+ ## First Turkish VLM ever!
15
+
16
+ TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
17
+ No module named 'imagine'
18
+ ![TeLVE logo](<teLVE_logo.png>)
19
+
20
+ ## Model Description
21
+
22
+ TeLVE combines:
23
+ - 🖼️ Vision Transformer (ViT-base-patch16-224)
24
+ - 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
25
+ - 🔄 Cross-attention mechanism for vision-language fusion
26
+
27
+ ### Version Logs
28
+ - **TeLVE v1.0**: Trained on Unsplash Lite dataset
29
+ - **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
30
+
31
+ ## Usage
32
+
33
+ The model can be used in two ways:
34
+
35
+ ### Inference (imagine.py)
36
+ ```python
37
+ # Generate captions for images
38
+ python imagine.py
39
+ ```
40
+ This script:
41
+ - Loads a trained TeLVE model
42
+ - Takes images from `images` directory
43
+ - Generates Turkish captions for each image
44
+ - Outputs the results to console
45
+
46
+ ### Training (main.py)
47
+ Users can train their own models with ViT and BERT encoders.
48
+ ```python
49
+ # Train a new model
50
+ python main.py
51
+ ```
52
+
53
+ This script:
54
+ - Loads and preprocesses image-caption pairs
55
+ - Initializes ViT and BERT encoders
56
+ - Trains the combined model
57
+ - Saves the model and tokenizer
58
+
59
+
60
+ ## Performance
61
+ Performance scores will be evaluated.
62
+ <!--
63
+ | Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
64
+ |--------------|---------|---------|---------|--------|
65
+ | TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
66
+ | TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
67
+
68
+ ## Citation
69
+
70
+ ```bibtex
71
+ @software{telve2024,
72
+ author = {Öğüt Su Karagün},
73
+ title = {TeLVE: Turkish efficient Language Vision Engine},
74
+ year = {2024},
75
+ url = {https://huggingface.co/outsu/TeLVE}
76
+ }
77
+ ```
78
+
79
+ ## License
80
+ This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
models/TeLVE_v1.0dep.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e74ea3f021a45ff9f888c841e8f07924b175fe2a50c73696daa7039be10df48
3
+ size 904212666