ubuntu
commited on
Commit
·
f173fcb
1
Parent(s):
82855e1
update readme
Browse files
README.md
CHANGED
@@ -71,7 +71,7 @@ For a higher resolution 448×672 image, we split it into 6 local image blocks us
|
|
71 |
> <sup>1:带 `*` 号的模型是闭源模型</sup>
|
72 |
|
73 |
对于上述所有比较模型,我们优先汇报其官方公布的结果。在缺少官方结果的情况下,我们采用了 [OpenCompass 榜单](https://rank.opencompass.org.cn/leaderboard-multimodal)的报告结果。若 OpenCompass 榜单上仍然缺少相应的数据集评估结果,
|
74 |
-
则来自于我们自行执行的评估流程所获得的数据。而评测框架则采用了[
|
75 |
|
76 |
### 传统VQA类任务
|
77 |
传统VQA任务,作为多模态视觉问答领域学术论文常引用的评测任务,具备显著的学术参考价值。因此,我们也将在此类数据集上报告相关的评测结果。
|
@@ -84,7 +84,7 @@ For a higher resolution 448×672 image, we split it into 6 local image blocks us
|
|
84 |
| VizWiz | **81.9** | 54.6 | 75.6 | 64.0 | 50.1 | 44.0 | 41.4 | 70.8 |
|
85 |
| TextVQA | **74.2** | 64.3 | 53.7 | 62.4 | 63.8 | 69.6 | 63.1 | 54.0 |
|
86 |
|
87 |
-
同理,对于上述所有比较模型,我们优先汇报其官方公布的结果。在缺少官方结果的情况下,则来自于我们自行执行的评估流程所获得的数据。而评测框架则采用了[
|
88 |
|
89 |
## Evaluation Reports
|
90 |
|
@@ -110,7 +110,7 @@ To comprehensively assess the model's performance, we conducted thorough testing
|
|
110 |
|
111 |
For all the compared models mentioned above, we prioritize reporting their officially published results. In cases where official results are unavailable, we rely on the reported results from the [OpenCompass leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal).
|
112 |
If the corresponding dataset evaluation results are still missing from the OpenCompass leaderboard, we include data obtained from our own evaluation process.
|
113 |
-
The evaluation framework used adheres to the [
|
114 |
|
115 |
### Traditional VQA tasks
|
116 |
The traditional Visual Question Answering (VQA) task, frequently referenced in academic literature in the field of multimodal visual question answering, holds significant academic reference value.
|
@@ -127,7 +127,7 @@ Therefore, we will also report relevant evaluation results on datasets of this k
|
|
127 |
|
128 |
|
129 |
Similarly, for all the compared models mentioned above, we prioritize reporting their officially published results. In the absence of official results, data is obtained from our own evaluation process.
|
130 |
-
The evaluation framework used adheres to the [
|
131 |
|
132 |
|
133 |
## 效果示例
|
|
|
71 |
> <sup>1:带 `*` 号的模型是闭源模型</sup>
|
72 |
|
73 |
对于上述所有比较模型,我们优先汇报其官方公布的结果。在缺少官方结果的情况下,我们采用了 [OpenCompass 榜单](https://rank.opencompass.org.cn/leaderboard-multimodal)的报告结果。若 OpenCompass 榜单上仍然缺少相应的数据集评估结果,
|
74 |
+
则来自于我们自行执行的评估流程所获得的数据。而评测框架则采用了[VLMEvalKit 评估框架](https://github.com/open-compass/VLMEvalKit/)。
|
75 |
|
76 |
### 传统VQA类任务
|
77 |
传统VQA任务,作为多模态视觉问答领域学术论文常引用的评测任务,具备显著的学术参考价值。因此,我们也将在此类数据集上报告相关的评测结果。
|
|
|
84 |
| VizWiz | **81.9** | 54.6 | 75.6 | 64.0 | 50.1 | 44.0 | 41.4 | 70.8 |
|
85 |
| TextVQA | **74.2** | 64.3 | 53.7 | 62.4 | 63.8 | 69.6 | 63.1 | 54.0 |
|
86 |
|
87 |
+
同理,对于上述所有比较模型,我们优先汇报其官方公布的结果。在缺少官方结果的情况下,则来自于我们自行执行的评估流程所获得的数据。而评测框架则采用了[VLMEvalKit 评估框架](https://github.com/open-compass/VLMEvalKit/)。
|
88 |
|
89 |
## Evaluation Reports
|
90 |
|
|
|
110 |
|
111 |
For all the compared models mentioned above, we prioritize reporting their officially published results. In cases where official results are unavailable, we rely on the reported results from the [OpenCompass leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal).
|
112 |
If the corresponding dataset evaluation results are still missing from the OpenCompass leaderboard, we include data obtained from our own evaluation process.
|
113 |
+
The evaluation framework used adheres to the [VLMEvalKit evaluation framework](https://github.com/open-compass/VLMEvalKit/).
|
114 |
|
115 |
### Traditional VQA tasks
|
116 |
The traditional Visual Question Answering (VQA) task, frequently referenced in academic literature in the field of multimodal visual question answering, holds significant academic reference value.
|
|
|
127 |
|
128 |
|
129 |
Similarly, for all the compared models mentioned above, we prioritize reporting their officially published results. In the absence of official results, data is obtained from our own evaluation process.
|
130 |
+
The evaluation framework used adheres to the [VLMEvalKit evaluation framework](https://github.com/open-compass/VLMEvalKit/).
|
131 |
|
132 |
|
133 |
## 效果示例
|