cmarkea
/

paligemma-3b-ft-docvqa-896-lora

Visual Question Answering

Model card Files Files and versions Community

SOKOUDJOU commited on Aug 23, 2024

Commit

1109a44

·

verified ·

1 Parent(s): dd710c0

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ pipeline_tag: visual-question-answering
 # paligemma-3b-ft-docvqa-896-lora
-**paligemma-3b-ft-docvqa-896-lora** is a fine-tuned version of the **[google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896/edit/main/README.md)** model, specifically trained on the **[doc-vqa](https://huggingface.co/datasets/cmarkea/doc-vqa)** dataset published by cmarkea. Optimized using the **LoRA** (Low-Rank Adaptation) method, this model was designed to enhance performance while reducing the complexity of fine-tuning.
 During training, particular attention was given to linguistic balance, with a focus on French. The model was exposed to a predominantly French context, with a 70% likelihood of interacting with French questions/answers for a given image. It operates exclusively in bfloat16 precision, optimizing computational resources. The entire training process took 3 week on a single A100 40GB.
@@ -31,7 +31,7 @@ Thanks to its multilingual specialization and emphasis on French, this model exc
 - **Model type:** Multi-modal model (image+text)
 - **Language(s) (NLP):** French, English
 - **License:** Apache 2.0
-- **Finetuned from model [optional]:** [google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896/edit/main/README.md)
 ## Usage

 # paligemma-3b-ft-docvqa-896-lora
+**paligemma-3b-ft-docvqa-896-lora** is a fine-tuned version of the **[google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896)** model, specifically trained on the **[doc-vqa](https://huggingface.co/datasets/cmarkea/doc-vqa)** dataset published by cmarkea. Optimized using the **LoRA** (Low-Rank Adaptation) method, this model was designed to enhance performance while reducing the complexity of fine-tuning.
 During training, particular attention was given to linguistic balance, with a focus on French. The model was exposed to a predominantly French context, with a 70% likelihood of interacting with French questions/answers for a given image. It operates exclusively in bfloat16 precision, optimizing computational resources. The entire training process took 3 week on a single A100 40GB.
 - **Model type:** Multi-modal model (image+text)
 - **Language(s) (NLP):** French, English
 - **License:** Apache 2.0
+- **Finetuned from model [optional]:** [google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896)
 ## Usage