rhymes-ai
/

Aria-Base-64K

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

teowu commited on 22 days ago

Commit

974542e

•

1 Parent(s): c1492b4

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ tags:
 This checkpoint is one of base models of [Aria](https://huggingface.co/rhymes-ai/Aria), designed for research purposes as well as continue training. Specifically, Aria-Base-64K corresponds to the model checkpoint after the long-context pre-training stage (boxed in purple).
-<img src="./aria-stages.png" alt="Aria Training Stages" style="width: 75%;">
 Aria-Base-64K is fine-tuned from [Aria-Base-8K](https://huggingface.co/teowu/Aria-Base-8K).
@@ -39,7 +39,7 @@ Aria-Base-64K is fine-tuned from [Aria-Base-8K](https://huggingface.co/teowu/Ari
 - **Appropriate for Video and Long-document Fine-tuning**: This model is recommended for long-form continue pre-training or fine-tuning, e.g. on video QA datasets or long-document QA datasets. While resource is limited, it is also possible to post-train this model with short instruction tuning datasets and transfer to long-form QA scenarios.
 - **Understanding on Hundreds of Images**: This model is capable of understanding up to 250 high-resolution images or up to 500 mid-resolution images.
 - **Strong Base Performance on Language and Multimodal Scenarios**: This model retains strong base performance as [Aria-Base-8K](https://huggingface.co/teowu/Aria-Base-8K).
-- ***Limited Chat Template Availability***: This model is trained with a very low percentage of data (around 3%) re-formatted with the chat template. Hence, it might not be optimal to be directly tested with various benchmarks.
 <!-- # Model Info

 This checkpoint is one of base models of [Aria](https://huggingface.co/rhymes-ai/Aria), designed for research purposes as well as continue training. Specifically, Aria-Base-64K corresponds to the model checkpoint after the long-context pre-training stage (boxed in purple).
+<img src="./aria-stages.png" alt="Aria Training Stages" style="width: 100%;">
 Aria-Base-64K is fine-tuned from [Aria-Base-8K](https://huggingface.co/teowu/Aria-Base-8K).
 - **Appropriate for Video and Long-document Fine-tuning**: This model is recommended for long-form continue pre-training or fine-tuning, e.g. on video QA datasets or long-document QA datasets. While resource is limited, it is also possible to post-train this model with short instruction tuning datasets and transfer to long-form QA scenarios.
 - **Understanding on Hundreds of Images**: This model is capable of understanding up to 250 high-resolution images or up to 500 mid-resolution images.
 - **Strong Base Performance on Language and Multimodal Scenarios**: This model retains strong base performance as [Aria-Base-8K](https://huggingface.co/teowu/Aria-Base-8K).
+- ***Limited Chat Template Availability***: This model is trained with a very low percentage of data (around 3%) re-formatted with the chat template. Hence, it might not be optimal to be directly used with chat templates.
 <!-- # Model Info