Multi-image but single-turn example

#3
by pcuenq HF staff - opened
Files changed (1) hide show
  1. README.md +5 -14
README.md CHANGED
@@ -74,22 +74,10 @@ messages = [
74
  "role": "user",
75
  "content": [
76
  {"type": "image"},
77
- {"type": "text", "text": "What do we see in this image?"}
78
- ]
79
- },
80
- {
81
- "role": "assistant",
82
- "content": [
83
- {"type": "text", "text": "This image shows a city skyline with prominent landmarks."}
84
- ]
85
- },
86
- {
87
- "role": "user",
88
- "content": [
89
  {"type": "image"},
90
- {"type": "text", "text": "And how about this image?"}
91
  ]
92
- }
93
  ]
94
 
95
  # Prepare inputs
@@ -108,6 +96,9 @@ print(generated_texts[0])
108
  ```
109
 
110
 
 
 
 
111
  ### Model optimizations
112
 
113
  **Precision**: For better performance, load and run the model in half-precision (`torch.float16` or `torch.bfloat16`) if your hardware supports it.
 
74
  "role": "user",
75
  "content": [
76
  {"type": "image"},
 
 
 
 
 
 
 
 
 
 
 
 
77
  {"type": "image"},
78
+ {"type": "text", "text": "Can you describe the two images?"}
79
  ]
80
+ },
81
  ]
82
 
83
  # Prepare inputs
 
96
  ```
97
 
98
 
99
+ > The first image shows a statue of the Statue of Liberty in front of a city skyline. The statue is green and is on a pedestal. The city skyline includes many tall buildings and skyscrapers. The sky is clear and blue. The water in the foreground is calm and blue. The second image shows a bee on a pink flower. The flower is surrounded by green leaves.
100
+
101
+
102
  ### Model optimizations
103
 
104
  **Precision**: For better performance, load and run the model in half-precision (`torch.float16` or `torch.bfloat16`) if your hardware supports it.