htdung167 commited on
Commit
ecb789e
1 Parent(s): f56fe18

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -25,7 +25,7 @@ pipeline_tag: image-text-to-text
25
  # EraX-VL-7B-V1
26
  ## Introduction
27
 
28
- We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for OCR (optical character recognition) and VQA (visual question-answering) that excels in various languages, with a particular focus on Vietnamese. The `EraX-VL-7B` model stands out for its precise recognition capabilities across a range of documents, including medical forms, invoices, bills of sale, quotes, and medical records. This functionality is expected to be highly beneficial for hospitals, clinics, insurance companies, and other similar applications. Built on the solid foundation of the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, EraX-VL-7B has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
29
 
30
  **EraX-VL-7B-V1** is a young member of our **EraX's LànhGPT** collection of LLM models.
31
 
@@ -36,6 +36,9 @@ We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for OCR
36
  - **License:** Apache 2.0
37
  - **Fine-tuned from:** [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
38
 
 
 
 
39
  ## Quickstart
40
  Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
41
 
@@ -173,10 +176,12 @@ If you find our project useful, we would appreciate it if you could star our rep
173
  ```
174
 
175
  ## References
176
- [1] Yang, An, et al. "Qwen2 technical report." arXiv preprint arXiv:2407.10671 (2024).
 
 
177
 
178
- [2] Chen, Zhe, et al. "Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
179
 
180
- [3] Chen, Zhe, et al. "How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites." arXiv preprint arXiv:2404.16821 (2024).
181
 
182
- [4] Tran, Chi, and Huong Le Thanh. "LaVy: Vietnamese Multimodal Large Language Model." arXiv preprint arXiv:2404.07922 (2024).
 
25
  # EraX-VL-7B-V1
26
  ## Introduction
27
 
28
+ We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for OCR (optical character recognition) and VQA (visual question-answering) that excels in various languages, with a particular focus on Vietnamese. The `EraX-VL-7B` model stands out for its precise recognition capabilities across a range of documents, including medical forms, invoices, bills of sale, quotes, and medical records. This functionality is expected to be highly beneficial for hospitals, clinics, insurance companies, and other similar applications. Built on the solid foundation of the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-7B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
29
 
30
  **EraX-VL-7B-V1** is a young member of our **EraX's LànhGPT** collection of LLM models.
31
 
 
36
  - **License:** Apache 2.0
37
  - **Fine-tuned from:** [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
38
 
39
+ ## Benchmarks
40
+ Coming Soon!!!
41
+
42
  ## Quickstart
43
  Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
44
 
 
176
  ```
177
 
178
  ## References
179
+ [1] Qwen team. Qwen2-VL. 2024.
180
+
181
+ [2] Yang, An, et al. "Qwen2 technical report." arXiv preprint arXiv:2407.10671 (2024).
182
 
183
+ [3] Chen, Zhe, et al. "Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
184
 
185
+ [4] Chen, Zhe, et al. "How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites." arXiv preprint arXiv:2404.16821 (2024).
186
 
187
+ [5] Tran, Chi, and Huong Le Thanh. "LaVy: Vietnamese Multimodal Large Language Model." arXiv preprint arXiv:2404.07922 (2024).