TeLVE v1.0dep released. Due to the addressing problem during training, it is not recommended to use it because it is trained with a dataset of about half the size.

Browse files

Files changed (2) hide show

README.md +80 -79
models/TeLVE_v1.0dep.pth +3 -0

README.md CHANGED Viewed

@@ -1,79 +1,80 @@
----
-license: cc-by-4.0
-language:
-- en
-- tr
-tags:
-- VLM
-- image2text
-- lm
----
-# TeLVE: Turkish efficient Language Vision Engine 🧿
-[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
-[![Models: v1.0](https://img.shields.io/badge/Models-v1.0-blue)](https://huggingface.co/outsu/TeLVE)
-## First Turkish VLM ever!
-TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
- No module named 'imagine'
-![TeLVE logo](<teLVE_logo.png>)
-## Model Description
-TeLVE combines:
-- 🖼️ Vision Transformer (ViT-base-patch16-224)
-- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
-- 🔄 Cross-attention mechanism for vision-language fusion
-### Version Logs
-- **TeLVE v1.0**: Trained on Unsplash Lite dataset
-## Usage
-The model can be used in two ways:
-### Inference (imagine.py)
-```python
-# Generate captions for images
-python imagine.py
-```
-This script:
-- Loads a trained TeLVE model
-- Takes images from `images` directory
-- Generates Turkish captions for each image
-- Outputs the results to console
-### Training (main.py)
-Users can train their own models with ViT and BERT encoders.
-```python
-# Train a new model
-python main.py
-```
-This script:
-- Loads and preprocesses image-caption pairs
-- Initializes ViT and BERT encoders
-- Trains the combined model
-- Saves the model and tokenizer
-## Performance
-Performance scores will be evaluated.
-<!--
-| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
-|--------------|---------|---------|---------|--------|
-| TeLVE v1.0   | Unsplash | *TBD*   | *TBD*   | *TBD*  |
-| TeLVE v1.1   | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
-## Citation
-```bibtex
-@software{telve2024,
-    author = {Öğüt Su Karagün},
-    title = {TeLVE: Turkish efficient Language Vision Engine},
-    year = {2024},
-    url = {https://huggingface.co/outsu/TeLVE}
-}
-```
-## License
-This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

+---
+license: cc-by-4.0
+language:
+- en
+- tr
+tags:
+- VLM
+- image2text
+- lm
+---
+# TeLVE: Turkish efficient Language Vision Engine 🧿
+[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
+[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
+## First Turkish VLM ever!
+TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
+ No module named 'imagine'
+![TeLVE logo](<teLVE_logo.png>)
+## Model Description
+TeLVE combines:
+- 🖼️ Vision Transformer (ViT-base-patch16-224)
+- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
+- 🔄 Cross-attention mechanism for vision-language fusion
+### Version Logs
+- **TeLVE v1.0**: Trained on Unsplash Lite dataset
+- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
+## Usage
+The model can be used in two ways:
+### Inference (imagine.py)
+```python
+# Generate captions for images
+python imagine.py
+```
+This script:
+- Loads a trained TeLVE model
+- Takes images from `images` directory
+- Generates Turkish captions for each image
+- Outputs the results to console
+### Training (main.py)
+Users can train their own models with ViT and BERT encoders.
+```python
+# Train a new model
+python main.py
+```
+This script:
+- Loads and preprocesses image-caption pairs
+- Initializes ViT and BERT encoders
+- Trains the combined model
+- Saves the model and tokenizer
+## Performance
+Performance scores will be evaluated.
+<!--
+| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
+|--------------|---------|---------|---------|--------|
+| TeLVE v1.0   | Unsplash | *TBD*   | *TBD*   | *TBD*  |
+| TeLVE v1.1   | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
+## Citation
+```bibtex
+@software{telve2024,
+    author = {Öğüt Su Karagün},
+    title = {TeLVE: Turkish efficient Language Vision Engine},
+    year = {2024},
+    url = {https://huggingface.co/outsu/TeLVE}
+}
+```
+## License
+This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

models/TeLVE_v1.0dep.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e74ea3f021a45ff9f888c841e8f07924b175fe2a50c73696daa7039be10df48
+size 904212666