Update README.md

Browse files

Files changed (1) hide show

README.md +10 -5

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ pipeline_tag: image-text-to-text
 # EraX-VL-7B-V1
 ## Introduction
-We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for OCR (optical character recognition) and VQA (visual question-answering) that excels in various languages, with a particular focus on Vietnamese. The `EraX-VL-7B` model stands out for its precise recognition capabilities across a range of documents, including medical forms, invoices, bills of sale, quotes, and medical records. This functionality is expected to be highly beneficial for hospitals, clinics, insurance companies, and other similar applications. Built on the solid foundation of the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, EraX-VL-7B has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
 **EraX-VL-7B-V1** is a young member of our **EraX's LànhGPT** collection of LLM models.
@@ -36,6 +36,9 @@ We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for OCR
 - **License:** Apache 2.0
 - **Fine-tuned from:** [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
 ## Quickstart
 Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
@@ -173,10 +176,12 @@ If you find our project useful, we would appreciate it if you could star our rep
 ```
 ## References
-[1] Yang, An, et al. "Qwen2 technical report." arXiv preprint arXiv:2407.10671 (2024).
-[2] Chen, Zhe, et al. "Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
-[3] Chen, Zhe, et al. "How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites." arXiv preprint arXiv:2404.16821 (2024).
-[4] Tran, Chi, and Huong Le Thanh. "LaVy: Vietnamese Multimodal Large Language Model." arXiv preprint arXiv:2404.07922 (2024).

 # EraX-VL-7B-V1
 ## Introduction
+We are excited to introduce **EraX-VL-7B-v1**, a robust multimodal model for OCR (optical character recognition) and VQA (visual question-answering) that excels in various languages, with a particular focus on Vietnamese. The `EraX-VL-7B` model stands out for its precise recognition capabilities across a range of documents, including medical forms, invoices, bills of sale, quotes, and medical records. This functionality is expected to be highly beneficial for hospitals, clinics, insurance companies, and other similar applications. Built on the solid foundation of the [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)[1], which we found to be of high quality and fluent in Vietnamese, `EraX-VL-7B` has been fine-tuned to enhance its performance. We plan to continue improving and releasing new versions for free, along with sharing performance benchmarks in the near future.
 **EraX-VL-7B-V1** is a young member of our **EraX's LànhGPT** collection of LLM models.
 - **License:** Apache 2.0
 - **Fine-tuned from:** [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
+## Benchmarks
+Coming Soon!!!
 ## Quickstart
 Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
 ```
 ## References
+[1] Qwen team. Qwen2-VL. 2024.
+[2] Yang, An, et al. "Qwen2 technical report." arXiv preprint arXiv:2407.10671 (2024).
+[3] Chen, Zhe, et al. "Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
+[4] Chen, Zhe, et al. "How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites." arXiv preprint arXiv:2404.16821 (2024).
+[5] Tran, Chi, and Huong Le Thanh. "LaVy: Vietnamese Multimodal Large Language Model." arXiv preprint arXiv:2404.07922 (2024).