Multi-image but single-turn example
#3
by
pcuenq
HF staff
- opened
README.md
CHANGED
@@ -74,22 +74,10 @@ messages = [
|
|
74 |
"role": "user",
|
75 |
"content": [
|
76 |
{"type": "image"},
|
77 |
-
{"type": "text", "text": "What do we see in this image?"}
|
78 |
-
]
|
79 |
-
},
|
80 |
-
{
|
81 |
-
"role": "assistant",
|
82 |
-
"content": [
|
83 |
-
{"type": "text", "text": "This image shows a city skyline with prominent landmarks."}
|
84 |
-
]
|
85 |
-
},
|
86 |
-
{
|
87 |
-
"role": "user",
|
88 |
-
"content": [
|
89 |
{"type": "image"},
|
90 |
-
{"type": "text", "text": "
|
91 |
]
|
92 |
-
}
|
93 |
]
|
94 |
|
95 |
# Prepare inputs
|
@@ -108,6 +96,9 @@ print(generated_texts[0])
|
|
108 |
```
|
109 |
|
110 |
|
|
|
|
|
|
|
111 |
### Model optimizations
|
112 |
|
113 |
**Precision**: For better performance, load and run the model in half-precision (`torch.float16` or `torch.bfloat16`) if your hardware supports it.
|
|
|
74 |
"role": "user",
|
75 |
"content": [
|
76 |
{"type": "image"},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
{"type": "image"},
|
78 |
+
{"type": "text", "text": "Can you describe the two images?"}
|
79 |
]
|
80 |
+
},
|
81 |
]
|
82 |
|
83 |
# Prepare inputs
|
|
|
96 |
```
|
97 |
|
98 |
|
99 |
+
> The first image shows a statue of the Statue of Liberty in front of a city skyline. The statue is green and is on a pedestal. The city skyline includes many tall buildings and skyscrapers. The sky is clear and blue. The water in the foreground is calm and blue. The second image shows a bee on a pink flower. The flower is surrounded by green leaves.
|
100 |
+
|
101 |
+
|
102 |
### Model optimizations
|
103 |
|
104 |
**Precision**: For better performance, load and run the model in half-precision (`torch.float16` or `torch.bfloat16`) if your hardware supports it.
|