Update README.md
Browse files
README.md
CHANGED
@@ -36,11 +36,11 @@ The model is developed to process diverse inputs, including images and text, fac
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
-
This version of Cephalo, lamm-mit/Cephalo-
|
40 |
|
41 |
### Chat Format
|
42 |
|
43 |
-
The lamm-mit/Cephalo-Idefics-2-vision-8b-
|
44 |
|
45 |
```raw
|
46 |
User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
|
@@ -76,7 +76,7 @@ DEVICE='cuda:0'
|
|
76 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration
|
77 |
from tqdm.notebook import tqdm
|
78 |
|
79 |
-
model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-
|
80 |
|
81 |
model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
|
82 |
torch_dtype=torch.bfloat16, #if your GPU allows
|
@@ -256,7 +256,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
|
|
256 |
|
257 |
```diff
|
258 |
model = AutoModelForVision2Seq.from_pretrained(
|
259 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-
|
260 |
+ torch_dtype=torch.float16,
|
261 |
).to(DEVICE)
|
262 |
```
|
@@ -277,7 +277,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
|
|
277 |
|
278 |
```diff
|
279 |
model = AutoModelForVision2Seq.from_pretrained(
|
280 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-
|
281 |
+ torch_dtype=torch.bfloat16,
|
282 |
+ _attn_implementation="flash_attention_2",
|
283 |
).to(DEVICE)
|
@@ -300,7 +300,7 @@ quantization_config = BitsAndBytesConfig(
|
|
300 |
bnb_4bit_compute_dtype=torch.bfloat16
|
301 |
)
|
302 |
model = AutoModelForVision2Seq.from_pretrained(
|
303 |
-
"lamm-mit/Cephalo-Idefics-2-vision-8b-
|
304 |
+ torch_dtype=torch.bfloat16,
|
305 |
+ quantization_config=quantization_config,
|
306 |
).to(DEVICE)
|
|
|
36 |
|
37 |
Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
|
38 |
|
39 |
+
This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-8b-beta, is based on the HuggingFaceM4/idefics2-8b-chatty model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
|
40 |
|
41 |
### Chat Format
|
42 |
|
43 |
+
The lamm-mit/Cephalo-Idefics-2-vision-8b-beta is suiteable for one or more image inputs, wih prompts using the chat format as follows:
|
44 |
|
45 |
```raw
|
46 |
User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
|
|
|
76 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration
|
77 |
from tqdm.notebook import tqdm
|
78 |
|
79 |
+
model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-beta'
|
80 |
|
81 |
model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
|
82 |
torch_dtype=torch.bfloat16, #if your GPU allows
|
|
|
256 |
|
257 |
```diff
|
258 |
model = AutoModelForVision2Seq.from_pretrained(
|
259 |
+
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
260 |
+ torch_dtype=torch.float16,
|
261 |
).to(DEVICE)
|
262 |
```
|
|
|
277 |
|
278 |
```diff
|
279 |
model = AutoModelForVision2Seq.from_pretrained(
|
280 |
+
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
281 |
+ torch_dtype=torch.bfloat16,
|
282 |
+ _attn_implementation="flash_attention_2",
|
283 |
).to(DEVICE)
|
|
|
300 |
bnb_4bit_compute_dtype=torch.bfloat16
|
301 |
)
|
302 |
model = AutoModelForVision2Seq.from_pretrained(
|
303 |
+
"lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
|
304 |
+ torch_dtype=torch.bfloat16,
|
305 |
+ quantization_config=quantization_config,
|
306 |
).to(DEVICE)
|