mjbuehler commited on
Commit
657a033
1 Parent(s): c1cdd15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -113
README.md CHANGED
@@ -57,98 +57,74 @@ User: How could this be used to design a fracture resistant material?<end_of_utt
57
  Assistant:
58
  ```
59
 
60
- If you need to manually set the chat template:
61
-
62
- ```
63
- IDEFICS2_CHAT_TEMPLATE = "{% for message in messages %}{{message['role'].capitalize()}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '<image>' }}{% endif %}{% endfor %}<end_of_utterance>\n{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}"
64
- ```
65
 
66
  ### Sample inference code
67
 
68
  This code snippets show how to get quickly started on a GPU:
69
 
70
  ```python
71
- from PIL import Image
72
- import requests
73
 
74
- DEVICE='cuda:0'
 
 
 
75
 
76
- from transformers import AutoProcessor, Idefics2ForConditionalGeneration
77
- from tqdm.notebook import tqdm
78
-
79
- model_id='lamm-mit/Cephalo-Idefics-2-vision-8b-beta'
80
-
81
- model = Idefics2ForConditionalGeneration.from_pretrained( model_id,
82
- torch_dtype=torch.bfloat16, #if your GPU allows
83
- _attn_implementation="flash_attention_2", #make sure Flash Attention 2 is installed
84
- trust_remote_code=True,
85
- ).to (DEVICE)
86
- processor = AutoProcessor.from_pretrained(
87
- f"{model_id}",
88
- do_image_splitting=True
89
- )
90
  ```
91
- See section towards the end for more comments on model optimization, including quantization.
92
 
 
93
 
94
- If you need to manually set the chat template:
95
 
96
- ```python
97
- IDEFICS2_CHAT_TEMPLATE = "{% for message in messages %}{{message['role'].capitalize()}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '<image>' }}{% endif %}{% endfor %}<end_of_utterance>\n{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}"
98
- tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)
99
- tokenizer.chat_template = IDEFICS2_CHAT_TEMPLATE
100
- processor.tokenizer = tokenizer
101
- ```
102
-
103
- Simple inference example:
104
 
105
  ```
106
- from transformers.image_utils import load_image
 
 
107
 
108
- image = load_image("https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg")
 
 
109
 
110
- # Create inputs
111
  messages = [
112
- {
113
- "role": "user",
114
- "content": [
115
- {"type": "image"},
116
- {"type": "text", "text": "What is shown in this image, and what is the relevance for materials design? Include a discussion of multi-agent AI."},
117
- ]
118
- },
119
  ]
120
- prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
 
121
 
122
- # Get inputs using the processor
123
- inputs = processor(text=prompt, images=[image], return_tensors="pt")
124
- inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
125
 
126
- # Generate
127
- generated_ids = model.generate(**inputs, max_new_tokens=500)
128
- generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
129
 
130
- print(generated_texts)
 
 
 
 
 
 
 
 
 
 
 
131
  ```
132
 
133
  Next we provide a convenience function for inference. This function takes the model, processor, question, and images, along with messages and images objects for repeated chat-like interactions with the model.
134
 
135
  ```python
136
- def ask_about_image (model, processor, question,
137
- images_input=[],
138
- verbatim=False,
139
- temperature=0.1,
140
- show_image=False,
141
- system="You are a biomaterials scientist who responds accurately. ",
142
- init_instr = "",
143
- show_conversation=True,
144
- max_new_tokens=256,
145
- messages=[],
146
- images=[],
147
- use_Markdown=False,
148
- ):
149
 
150
-
151
- query = question
152
  images_input=ensure_list(images_input)
153
  if len (images)==0:
154
  if len (images_input)>0:
@@ -156,75 +132,68 @@ def ask_about_image (model, processor, question,
156
  if is_url(image):
157
  image= load_image(image)
158
  images.append (image)
159
-
160
  if show_image:
161
  display ( image )
162
  if len (messages)==0:
163
-
164
- base_message = {
165
- "role": "user",
166
- "content": [
167
- {"type": "text", "text": system + init_instr},
168
- # Image messages will be added dynamically here
169
- {"type": "text", "text": query}
170
- ]
171
- }
172
-
173
- # Ensure the images_input is a list
174
- images_input = ensure_list(images_input)
175
-
176
- # Add image messages dynamically
177
- image_messages = [{"type": "image"} for _ in images_input]
178
- base_message["content"][1:1] = image_messages # Insert image messages before the last text message
179
 
180
- # Append the constructed message to messages list
181
- messages.append(base_message)
182
-
183
  else:
184
  messages.append (
185
- {
186
- "role": "user",
187
- "content": [
188
- {"type": "text", "text": query
189
- }
190
- ]
191
- }
192
- )
193
  if verbatim:
194
  print (messages)
195
 
196
- text = processor.apply_chat_template(messages, add_generation_prompt=True)
197
- inputs = processor(text=[text.strip()], images=images, return_tensors="pt", padding=True).to(DEVICE)
198
-
199
- generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True)
200
- generated_texts = processor.batch_decode(generated_ids[:, inputs["input_ids"].size(1):], skip_special_tokens=True)
201
-
202
- messages.append (
203
- {
204
- "role": "assistant",
205
- "content": [ {"type": "text", "text": generated_texts[0]}, ]
206
- }
207
- )
 
 
 
 
 
208
  formatted_conversation = format_conversation(messages, images)
209
 
210
- # Display the formatted conversation, e.g. in Jupyter Notebook
211
  if show_conversation:
212
-
213
  if use_Markdown:
214
  display(Markdown(formatted_conversation))
215
  else:
216
  display(HTML(formatted_conversation))
217
-
218
  return generated_texts, messages, images
219
 
220
- question = "What is shown in this image, and what is the relevance for materials design? Include a discussion of multi-agent AI."
 
 
221
 
222
  url1 = "https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg"
223
 
224
  response, messages,images= ask_about_image ( model, processor, question,
225
  images_input=[url1,],
226
  temperature=0.1,
227
- system= '', init_instr='You carefully study the image, and respond accurately, but succinctly. Think step-by-step.\n\n',
 
228
  show_conversation=True,
229
  max_new_tokens=512, messages=[], images=[])
230
  ```
@@ -235,11 +204,13 @@ Sample output:
235
  <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
236
 
237
  <pre style="white-space: pre-wrap;">
238
- The image depicts a group of ants moving in a coordinated manner, demonstrating their ability to navigate complex environments and adapt to changing conditions. This behavior is relevant for materials design because it highlights the potential of multi-agent AI systems to mimic natural systems and develop new materials with enhanced properties.
 
 
239
 
240
- Multi-agent AI refers to the use of multiple autonomous agents working together to solve complex problems. These agents can learn from each other and adapt to new situations, similar to how ants can navigate their environment and communicate with one another. By applying these principles to materials design, researchers can develop new materials that exhibit improved performance, such as enhanced strength, flexibility, and adaptability.
241
 
242
- The relevance of this image for materials design lies in the inspiration it provides for developing new materials that can mimic the natural efficiency and adaptability of ants. By studying the behavior of ants, researchers can gain insights into how to design materials that can respond dynamically to changes in their environment, leading to improved performance and functionality.
243
  </pre>
244
 
245
  ## Dataset generation
 
57
  Assistant:
58
  ```
59
 
60
+
 
 
 
 
61
 
62
  ### Sample inference code
63
 
64
  This code snippets show how to get quickly started on a GPU:
65
 
66
  ```python
67
+ model_id='lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k'
 
68
 
69
+ model = MllamaForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16,
70
+ #_attn_implementation="flash_attention_2",
71
+ trust_remote_code=True,
72
+ ).to (DEVICE )
73
 
74
+ processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True, )
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ```
 
76
 
77
+ Simple inference example:
78
 
79
+ We are asking a question about this image, showing a material microstructure and associated stress-strain responses.
80
 
81
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/4JwIGSfl82hMEyHasOSU4.png)
 
 
 
 
 
 
 
82
 
83
  ```
84
+ import requests
85
+ import torch
86
+ from PIL import Image
87
 
88
+ url = "https://huggingface.co/lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k/resolve/main/architected_stress_strain.png"
89
+
90
+ image = Image.open(requests.get(url, stream=True).raw)
91
 
 
92
  messages = [
93
+ {"role": "user", "content": [
94
+ {"type": "image"},
95
+ {"type": "text", "text": "Consider the stress-strain response under compression. What are the three curves shown. Based on an inspection of the plot, do they show good agreement or are they very different?"}
96
+ ]}
 
 
 
97
  ]
98
+ input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
99
+ inputs = processor(image, input_text, return_tensors="pt").to(model.device)
100
 
101
+ output = model.generate(**inputs, max_new_tokens=512)
102
+ print(processor.decode(output[0]))
103
+ ```
104
 
105
+ Raw output:
 
 
106
 
107
+ ```
108
+ <|begin_of_text|><|start_header_id|>user<|end_header_id|>
109
+
110
+ <|image|>Consider the stress-strain response under compression. What are the three curves shown. Based on an inspection of the plot, do they show good agreement or are they very different?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
111
+
112
+ The image shows three curves representing the stress-strain response under compression. The x-axis represents strain, which is the deformation experienced by the material relative to its original length, while the y-axis represents stress, which is the force applied per unit area.
113
+
114
+ - The blue curve is labeled "Predicted," indicating a predicted model or simulation result.
115
+ - The orange curve is labeled "Ground truth," indicating actual experimental data or true values.
116
+ - The green curve is labeled "Simulation result," likely representing another simulation result for comparison.
117
+
118
+ The curves show an increasing trend of stress with strain, indicating that the material becomes more stressed as it deforms. The predicted and simulation results (blue and green curves) closely follow the ground truth (orange curve), suggesting good agreement among the predicted and simulated models and the actual experimental data. This implies that the models used are accurate in predicting the material's response under compression. The curves do not show significant deviations, indicating reliable modeling and simulation techniques.<|eot_id|>
119
  ```
120
 
121
  Next we provide a convenience function for inference. This function takes the model, processor, question, and images, along with messages and images objects for repeated chat-like interactions with the model.
122
 
123
  ```python
124
+ def ask_about_image (model, processor, question, images_input=[], verbatim=False,temperature=0.1,show_image=False,
125
+ system="You are a materials scientist. ", init_instr = "", show_conversation=True,
126
+ max_new_tokens=256, messages=[], images=[], use_Markdown=False):
 
 
 
 
 
 
 
 
 
 
127
 
 
 
128
  images_input=ensure_list(images_input)
129
  if len (images)==0:
130
  if len (images_input)>0:
 
132
  if is_url(image):
133
  image= load_image(image)
134
  images.append (image)
135
+
136
  if show_image:
137
  display ( image )
138
  if len (messages)==0:
139
+ messages = [
140
+ {"role": "user", "content": [
141
+ {"type": "image"},
142
+ {"type": "text", "text": question}
143
+ ]}
144
+ ]
145
+ input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
146
+ inputs = processor(image, input_text, return_tensors="pt").to(model.device)
 
 
 
 
 
 
 
 
147
 
 
 
 
148
  else:
149
  messages.append (
150
+ {"role": "user", "content": [
151
+
152
+ {"type": "text", "text": question}
153
+ ]} )
154
+
 
 
 
155
  if verbatim:
156
  print (messages)
157
 
158
+ text = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
159
+ inputs = processor(text=text, images=images, return_tensors="pt", ).to(DEVICE)
160
+
161
+ generation_args = {
162
+ "max_new_tokens": max_new_tokens,
163
+ "temperature": temperature,
164
+ "do_sample": True,
165
+ }
166
+
167
+ generate_ids = model.generate(**inputs,# eos_token_id=processor.tokenizer.eos_token_id,
168
+ **generation_args)
169
+
170
+ generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:-1]
171
+ generated_texts = processor.decode(generate_ids[0], clean_up_tokenization_spaces=False)
172
+
173
+ messages.append ( {"role": "assistant", "content": [ {"type": "text", "text": generated_texts}]} )
174
+
175
  formatted_conversation = format_conversation(messages, images)
176
 
177
+ # Display the formatted conversation in Jupyter Notebook
178
  if show_conversation:
 
179
  if use_Markdown:
180
  display(Markdown(formatted_conversation))
181
  else:
182
  display(HTML(formatted_conversation))
183
+
184
  return generated_texts, messages, images
185
 
186
+ question = """What is shown in this image, and what is the relevance for materials design? Include a discussion of multi-agent AI.
187
+
188
+ First brainstorm, then organize your thoughts, then respond."""
189
 
190
  url1 = "https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/02/Ants_Lede1300.jpg"
191
 
192
  response, messages,images= ask_about_image ( model, processor, question,
193
  images_input=[url1,],
194
  temperature=0.1,
195
+ system= '',
196
+ init_instr='You carefully study the image, and respond accurately, but succinctly. Think step-by-step.\n\n',
197
  show_conversation=True,
198
  max_new_tokens=512, messages=[], images=[])
199
  ```
 
204
  <small>Image by [Vaishakh Manohar](https://www.quantamagazine.org/the-simple-algorithm-that-ants-use-to-build-bridges-20180226/)</small>
205
 
206
  <pre style="white-space: pre-wrap;">
207
+ The image shows a group of ants working together to move a large object. This scene illustrates the concept of swarm intelligence, where individual agents (ants) collectively achieve a complex task through decentralized, self-organized behavior.
208
+
209
+ In materials design, this concept can be applied to develop new materials and structures by mimicking the behavior of swarms. For instance, researchers have used swarm intelligence algorithms to optimize the design of composite materials, such as fiber-reinforced polymers, by simulating the behavior of ants or other swarming organisms. These algorithms can help identify the optimal arrangement of fibers to maximize strength and minimize weight.
210
 
211
+ Multi-agent AI, which involves the coordination of multiple autonomous agents to achieve a common goal, can also be used in materials design. This approach can be applied to simulate the behavior of complex systems, such as biological tissues or nanomaterials, and optimize their properties through machine learning algorithms. By analyzing the behavior of individual agents and their interactions, researchers can develop new materials with improved performance and functionality.
212
 
213
+ In summary, the image of ants working together to move a large object serves as a metaphor for the potential of swarm intelligence and multi-agent AI in materials design. By mimicking the behavior of swarms, researchers can develop new materials and structures with improved properties and functionality.
214
  </pre>
215
 
216
  ## Dataset generation