q-future
/

co-instruct

@@ -1,14 +1,16 @@
 ## Performance
 ### Low-level Question-Answering
-This model has reached 75.12\%(*12\% better than previous version*)/74.98\%(*8.5\% better than previous version*) on Q-Bench A1 *dev/test* (multi-choice questions).
 It also outperforms the following close-source models with much larger model capacities:
 | Model | *dev* | *test* |
 | ---- | ---- | ---- |
-| **Co-Instruct-Preview (mPLUG-Owl2) (This Model)** | **75.12\%** | **74.98\%** |
 | \*GPT-4V-Turbo | 74.41\% | 74.10\% |
 | \*Qwen-VL-**Max** | 73.63\%  | 73.90\% |
 | \*GPT-4V (Nov. 2023) | 71.78\% | 73.44\% |
@@ -23,8 +25,8 @@ It also outperforms the following close-source models with much larger model cap
 | Model                    | live         | agi          | livec       | test_spaq   | csiq        | test_kadid  | test_koniq  | konvid      | maxwell_test |
 |--------------------------|--------------|--------------|-------------|-------------|-------------|-------------|-------------|-------------|--------------|
-|**Co-Instruct-Preview (mPLUG-Owl2) (This Model)**     | **0.771/0.751**  | **0.727/0.749**  | **0.861/0.865** | **0.946/0.938** | **0.735/0.748** | **0.782/0.770** | **0.908/0.941** | **0.818/0.790** | **0.735/0.714**  |
-| Q-Instruct (mPLUG-Owl2, Nov. 2023) | 0.749/0.747  | 0.710/0.753  | 0.781/0.791 | 0.921/0.917 | 0.693/0.723 | 0.670/0.665 | 0.904/0.921 | 0.766/0.738 | 0.650/0.649  |
 We are also constructing multi-image benchmark sets (image pairs, triple-quadruple images), and the results on multi-image benchmarks will be released soon!
@@ -38,7 +40,7 @@ from transformers import AutoModelForCausalLM
 model = AutoModelForCausalLM.from_pretrained("q-future/co-instruct-preview",
                                              trust_remote_code=True,
                                              torch_dtype=torch.float16,
-                                             attn_implementation="flash_attention_2",
                                              device_map={"":"cuda:0"})
 ```

 ## Performance
+*Updated Feb 1st.*
 ### Low-level Question-Answering
+This model has reached 75.90\%(*13\% better than previous version*)/76.52\%(*10\% better than previous version*) on Q-Bench A1 *dev/test* (multi-choice questions).
 It also outperforms the following close-source models with much larger model capacities:
 | Model | *dev* | *test* |
 | ---- | ---- | ---- |
+| **Co-Instruct-Preview (mPLUG-Owl2) (This Model)** | **75.90\%** | **76.52\%** |
 | \*GPT-4V-Turbo | 74.41\% | 74.10\% |
 | \*Qwen-VL-**Max** | 73.63\%  | 73.90\% |
 | \*GPT-4V (Nov. 2023) | 71.78\% | 73.44\% |
 | Model                    | live         | agi          | livec       | test_spaq   | csiq        | test_kadid  | test_koniq  | konvid      | maxwell_test |
 |--------------------------|--------------|--------------|-------------|-------------|-------------|-------------|-------------|-------------|--------------|
+|**Co-Instruct-Preview (mPLUG-Owl2) (This Model)**     | **0.803/0.756**  | **0.719**/0.732  | **0.827/0.835** | **0.946/0.937** | **0.711/0.727** | **0.782/0.766** | 0.886/**0.935** | **0.818/0.790** | **0.735/0.714**  |
+| Q-Instruct (mPLUG-Owl2, Nov. 2023) | 0.749/0.747  | 0.710/**0.753**  | 0.781/0.791 | 0.921/0.917 | 0.693/0.723 | 0.670/0.665 | **0.904**/0.921 | 0.766/0.738 | 0.650/0.649  |
 We are also constructing multi-image benchmark sets (image pairs, triple-quadruple images), and the results on multi-image benchmarks will be released soon!
 model = AutoModelForCausalLM.from_pretrained("q-future/co-instruct-preview",
                                              trust_remote_code=True,
                                              torch_dtype=torch.float16,
+                                             attn_implementation="eager",
                                              device_map={"":"cuda:0"})
 ```