lamm-mit
/

Cephalo-Idefics-2-vision-8b-beta

@@ -36,11 +36,11 @@ The model is developed to process diverse inputs, including images and text, fac
 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
-This version of Cephalo, lamm-mit/Cephalo-Phi-3-vision-128k-4b-alpha, is based on the HuggingFaceM4/idefics2-8b-chatty model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
 ### Chat Format
-The lamm-mit/Cephalo-Idefics-2-vision-8b-alpha is suiteable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
@@ -76,7 +76,7 @@ DEVICE='cuda:0'
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
-model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-alpha'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
@@ -256,7 +256,7 @@ If your GPU allows, load and run inference in half precision (`torch.float16` or
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
 +    torch_dtype=torch.float16,
 ).to(DEVICE)
 ```
@@ -277,7 +277,7 @@ Mke sure to install `flash-attn`. Refer to the [original repository of Flash Att
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
 +    torch_dtype=torch.bfloat16,
 +    _attn_implementation="flash_attention_2",
 ).to(DEVICE)
@@ -300,7 +300,7 @@ quantization_config = BitsAndBytesConfig(
     bnb_4bit_compute_dtype=torch.bfloat16
 )
 model = AutoModelForVision2Seq.from_pretrained(
-    "lamm-mit/Cephalo-Idefics-2-vision-8b-alpha",
 +    torch_dtype=torch.bfloat16,
 +    quantization_config=quantization_config,
 ).to(DEVICE)

 Cephalo provides a robust framework for multimodal interaction and understanding, including the development of complex generative pipelines to create 2D and 3D renderings of material microstructures as input for additive manufacturing methods.
+This version of Cephalo, lamm-mit/Cephalo-Idefics-2-vision-8b-beta, is based on the HuggingFaceM4/idefics2-8b-chatty model. The model was trained on a combination of scientific text-image data extracted from Wikipedia and scientific papers. For further details on the base model, see: https://huggingface.co/HuggingFaceM4/idefics2-8b-chatty. More details about technical aspects of the model, training and example applications to materials science problems are provided in the paper (reference at the bottom).
 ### Chat Format
+The lamm-mit/Cephalo-Idefics-2-vision-8b-beta is suiteable for one or more image inputs, wih prompts using the chat format as follows:
 ```raw
 User: You carefully study the image, and respond accurately, but succinctly. Think step-by-step.
 from transformers import AutoProcessor, Idefics2ForConditionalGeneration
 from tqdm.notebook import tqdm
+model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-beta'
 model = Idefics2ForConditionalGeneration.from_pretrained(  model_id,
                                                            torch_dtype=torch.bfloat16, #if your GPU allows
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.float16,
 ).to(DEVICE)
 ```
 ```diff
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.bfloat16,
 +    _attn_implementation="flash_attention_2",
 ).to(DEVICE)
     bnb_4bit_compute_dtype=torch.bfloat16
 )
 model = AutoModelForVision2Seq.from_pretrained(
+    "lamm-mit/Cephalo-Idefics-2-vision-8b-beta",
 +    torch_dtype=torch.bfloat16,
 +    quantization_config=quantization_config,
 ).to(DEVICE)