sbrzz
/

TinyLLaVA-Qwen2.5-0.5B-Instruct-dinov2-small

Visual Question Answering

Model card Files Files and versions Community

sbrzz commited on Nov 5, 2024

Commit

09d5726

·

verified ·

1 Parent(s): f736b3d

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -10,6 +10,16 @@ pipeline_tag: visual-question-answering
 Pretrain stage only, 4630 epochs
 | Category    | # Samples | TP | FP | TN | FN | Accuracy | Precision | Recall | F1 Score | Yes Ratio |
 |-----------------|---------------|--------|--------|--------|--------|--------------|---------------|------------|--------------|---------------|
 | Adversarial     | 3000          | 1312   | 1250   | 250    | 188    | 0.521        | 0.512        | 0.875      | 0.646        | 0.854         |
@@ -17,6 +27,14 @@ Pretrain stage only, 4630 epochs
 | Random          | 2910          | 1312   | 1185   | 225    | 188    | 0.528        | 0.525        | 0.875      | 0.656        | 0.858         |
 [MMMU](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#mmmu)
 | Category                        | # Samples | Accuracy |

 Pretrain stage only, 4630 epochs
+# Introduction
+We use the powerful [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model.
+The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory).
+For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively.
+[POPE](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#pope):
 | Category    | # Samples | TP | FP | TN | FN | Accuracy | Precision | Recall | F1 Score | Yes Ratio |
 |-----------------|---------------|--------|--------|--------|--------|--------------|---------------|------------|--------------|---------------|
 | Adversarial     | 3000          | 1312   | 1250   | 250    | 188    | 0.521        | 0.512        | 0.875      | 0.646        | 0.854         |
 | Random          | 2910          | 1312   | 1185   | 225    | 188    | 0.528        | 0.525        | 0.875      | 0.656        | 0.858         |
+[TEXTVQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#textvqa)
+Samples 5000, Accuracy 0% (:-|)
+[SCIENCEQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#scienceqa)
+Samples 4241, Correct: -, Accuracy: -%, IMG-Accuracy: -%
 [MMMU](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#mmmu)
 | Category                        | # Samples | Accuracy |