Visual Question Answering
PEFT
Safetensors
French
English
SOKOUDJOU commited on
Commit
1109a44
·
verified ·
1 Parent(s): dd710c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -13,7 +13,7 @@ pipeline_tag: visual-question-answering
13
  # paligemma-3b-ft-docvqa-896-lora
14
 
15
 
16
- **paligemma-3b-ft-docvqa-896-lora** is a fine-tuned version of the **[google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896/edit/main/README.md)** model, specifically trained on the **[doc-vqa](https://huggingface.co/datasets/cmarkea/doc-vqa)** dataset published by cmarkea. Optimized using the **LoRA** (Low-Rank Adaptation) method, this model was designed to enhance performance while reducing the complexity of fine-tuning.
17
 
18
  During training, particular attention was given to linguistic balance, with a focus on French. The model was exposed to a predominantly French context, with a 70% likelihood of interacting with French questions/answers for a given image. It operates exclusively in bfloat16 precision, optimizing computational resources. The entire training process took 3 week on a single A100 40GB.
19
 
@@ -31,7 +31,7 @@ Thanks to its multilingual specialization and emphasis on French, this model exc
31
  - **Model type:** Multi-modal model (image+text)
32
  - **Language(s) (NLP):** French, English
33
  - **License:** Apache 2.0
34
- - **Finetuned from model [optional]:** [google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896/edit/main/README.md)
35
 
36
 
37
  ## Usage
 
13
  # paligemma-3b-ft-docvqa-896-lora
14
 
15
 
16
+ **paligemma-3b-ft-docvqa-896-lora** is a fine-tuned version of the **[google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896)** model, specifically trained on the **[doc-vqa](https://huggingface.co/datasets/cmarkea/doc-vqa)** dataset published by cmarkea. Optimized using the **LoRA** (Low-Rank Adaptation) method, this model was designed to enhance performance while reducing the complexity of fine-tuning.
17
 
18
  During training, particular attention was given to linguistic balance, with a focus on French. The model was exposed to a predominantly French context, with a 70% likelihood of interacting with French questions/answers for a given image. It operates exclusively in bfloat16 precision, optimizing computational resources. The entire training process took 3 week on a single A100 40GB.
19
 
 
31
  - **Model type:** Multi-modal model (image+text)
32
  - **Language(s) (NLP):** French, English
33
  - **License:** Apache 2.0
34
+ - **Finetuned from model [optional]:** [google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896)
35
 
36
 
37
  ## Usage