Visual Question Answering
PEFT
Safetensors
French
English
SOKOUDJOU commited on
Commit
e2183ff
·
verified ·
1 Parent(s): af37333

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -12,9 +12,12 @@ pipeline_tag: visual-question-answering
12
 
13
  # paligemma-3b-ft-docvqa-896-lora
14
 
15
- paligemma-3b-ft-docvqa-896-lora is a Vision-Language Model (VLM) based on [google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896/edit/main/README.md) model
16
- and trained in original LLaVA setup using LORA. This model is primarily adapted to work with French, but still capable to work with English.
17
 
 
 
 
 
 
18
 
19
  ## Model Details
20
 
@@ -49,7 +52,7 @@ image = Image.open(requests.get(url, stream=True).raw)
49
 
50
  model = PaliGemmaForConditionalGeneration.from_pretrained(
51
  model_id,
52
- torch_dtype=dtype,
53
  device_map=device,
54
  ).eval()
55
 
@@ -67,10 +70,6 @@ with torch.inference_mode():
67
  print(decoded)
68
  ```
69
 
70
- ## Training Details
71
-
72
- [More Information Needed]
73
-
74
 
75
  ### Results
76
 
 
12
 
13
  # paligemma-3b-ft-docvqa-896-lora
14
 
 
 
15
 
16
+ paligemma-3b-ft-docvqa-896-lora is a fine-tuned version of the [google/paligemma-3b-ft-docvqa-896](https://huggingface.co/google/paligemma-3b-ft-docvqa-896/edit/main/README.md) model, specifically trained on the [doc-vqa](https://huggingface.co/datasets/cmarkea/doc-vqa) dataset published by cmarkea. Optimized using the LoRA (Low-Rank Adaptation) method, this model was designed to enhance performance while reducing the complexity of fine-tuning.
17
+
18
+ During training, particular attention was given to linguistic balance, with a focus on French. The model was exposed to a predominantly French context, with a 70% likelihood of interacting with French questions/answers for a given image. It operates exclusively in bfloat16 precision, optimizing computational resources.
19
+
20
+ Thanks to its multilingual specialization and emphasis on French, this model excels in francophone environments, while also performing well in English. It is especially suited for tasks that require the analysis and understanding of complex documents, such as extracting information from forms, invoices, reports, and other text-based documents in a visual question-answering context.
21
 
22
  ## Model Details
23
 
 
52
 
53
  model = PaliGemmaForConditionalGeneration.from_pretrained(
54
  model_id,
55
+ torch_dtype=torch.bfloat16,
56
  device_map=device,
57
  ).eval()
58
 
 
70
  print(decoded)
71
  ```
72
 
 
 
 
 
73
 
74
  ### Results
75