quarterturn
commited on
Commit
•
d8143df
1
Parent(s):
83af106
caption.py now support 4-bit; added 4-bit quant of molmo
Browse files- README.md +7 -4
- caption.py +26 -10
- images/00000-203229662.png +3 -0
- images/00000-203229662.txt +1 -0
- images/00006-2234503665.png +3 -0
- images/00006-2234503665.txt +1 -0
- images/00030-4075734474.png +3 -0
- images/00030-4075734474.txt +1 -0
- model/molmo-7B-D-bnb-4bit +1 -0
README.md
CHANGED
@@ -11,7 +11,10 @@ Install:
|
|
11 |
2. cd to "models" and clone Molmo-7B-D-0924:
|
12 |
```
|
13 |
git lfs install
|
14 |
-
git clone https://huggingface.co/allenai/Molmo-7B-D-0924
|
|
|
|
|
|
|
15 |
1. create a python3 venv or use conda to create an environment, eg:
|
16 |
``` conda create -n caption python=3.11 ```
|
17 |
2. activate your environment, eg:
|
@@ -26,11 +29,11 @@ Install:
|
|
26 |
4. click the button to download the caption zip file, the link is at the top of the page
|
27 |
|
28 |
run the command-line version:
|
29 |
-
``` python3 caption.py ```
|
|
|
30 |
1. make sure your images are in the "images" directory
|
31 |
2. captions will be placed in the "images" directory
|
32 |
|
33 |
Note:
|
34 |
-
-
|
35 |
-
- You can edit the scripts to use a lower quant of the model, such as fp8, though accuracy may be lower.
|
36 |
- If torch sees your first GPU supports flash attention and the others do not, it will assume all the cards do and it will throw an exception. A workaround is to use, for example, "CUDA_VISIBLE_DEVICES=0 python3 main.py (or caption.py)", to force torch to ignore the card supporting flash attention, so that it will use your other cards without it. Or, use it to exclude non-flash-attention-supporting GPUs.
|
|
|
11 |
2. cd to "models" and clone Molmo-7B-D-0924:
|
12 |
```
|
13 |
git lfs install
|
14 |
+
git clone https://huggingface.co/allenai/Molmo-7B-D-0924
|
15 |
+
```
|
16 |
+
Since the 4-bit quant isn't that large, I have included it here. There's no need to clone it seperately. The full 32-bit version is big, so I leave it up to you to clone it if you want it.
|
17 |
+
|
18 |
1. create a python3 venv or use conda to create an environment, eg:
|
19 |
``` conda create -n caption python=3.11 ```
|
20 |
2. activate your environment, eg:
|
|
|
29 |
4. click the button to download the caption zip file, the link is at the top of the page
|
30 |
|
31 |
run the command-line version:
|
32 |
+
``` python3 caption.py ``` (use molmo at bf16 for more accuracy; needs 24GB GPU)
|
33 |
+
``` python3 caption.py -q ``` (use molmo at int4; should be fine with 12GB GPU)
|
34 |
1. make sure your images are in the "images" directory
|
35 |
2. captions will be placed in the "images" directory
|
36 |
|
37 |
Note:
|
38 |
+
- main.py (gradio version does not yet support quant model)
|
|
|
39 |
- If torch sees your first GPU supports flash attention and the others do not, it will assume all the cards do and it will throw an exception. A workaround is to use, for example, "CUDA_VISIBLE_DEVICES=0 python3 main.py (or caption.py)", to force torch to ignore the card supporting flash attention, so that it will use your other cards without it. Or, use it to exclude non-flash-attention-supporting GPUs.
|
caption.py
CHANGED
@@ -1,9 +1,15 @@
|
|
1 |
import os
|
|
|
2 |
import torch
|
3 |
from PIL import Image
|
4 |
import requests
|
5 |
from transformers import AutoProcessor, AutoModelForCausalLM, GenerationConfig, BitsAndBytesConfig
|
6 |
|
|
|
|
|
|
|
|
|
|
|
7 |
if torch.cuda.is_available():
|
8 |
device = torch.device("cuda")
|
9 |
print("GPU is available. Using CUDA.")
|
@@ -11,7 +17,7 @@ else:
|
|
11 |
device = torch.device("cpu")
|
12 |
print("GPU is not available. Using CPU.")
|
13 |
|
14 |
-
#
|
15 |
local_path = "./model/Molmo-7B-D-0924"
|
16 |
processor = AutoProcessor.from_pretrained(
|
17 |
local_path,
|
@@ -21,15 +27,25 @@ processor = AutoProcessor.from_pretrained(
|
|
21 |
device_map='auto'
|
22 |
)
|
23 |
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
# directory containing the images
|
35 |
image_directory = "./images"
|
|
|
1 |
import os
|
2 |
+
import argparse
|
3 |
import torch
|
4 |
from PIL import Image
|
5 |
import requests
|
6 |
from transformers import AutoProcessor, AutoModelForCausalLM, GenerationConfig, BitsAndBytesConfig
|
7 |
|
8 |
+
# Parse command-line arguments
|
9 |
+
parser = argparse.ArgumentParser(description="Load and use a quantized model")
|
10 |
+
parser.add_argument("-q", "--use_quant", action="store_true", help="Use quantized model")
|
11 |
+
args = parser.parse_args()
|
12 |
+
|
13 |
if torch.cuda.is_available():
|
14 |
device = torch.device("cuda")
|
15 |
print("GPU is available. Using CUDA.")
|
|
|
17 |
device = torch.device("cpu")
|
18 |
print("GPU is not available. Using CPU.")
|
19 |
|
20 |
+
# Load the processor
|
21 |
local_path = "./model/Molmo-7B-D-0924"
|
22 |
processor = AutoProcessor.from_pretrained(
|
23 |
local_path,
|
|
|
27 |
device_map='auto'
|
28 |
)
|
29 |
|
30 |
+
# Load the model
|
31 |
+
if args.use_quant:
|
32 |
+
# Load the quantized model
|
33 |
+
quantized_local_path = "./model/molmo-7B-D-bnb-4bit"
|
34 |
+
model = AutoModelForCausalLM.from_pretrained(
|
35 |
+
quantized_local_path,
|
36 |
+
trust_remote_code=True,
|
37 |
+
torch_dtype='auto',
|
38 |
+
device_map='auto',
|
39 |
+
)
|
40 |
+
else:
|
41 |
+
# Load the non-quantized model
|
42 |
+
model = AutoModelForCausalLM.from_pretrained(
|
43 |
+
local_path,
|
44 |
+
trust_remote_code=True,
|
45 |
+
torch_dtype='auto',
|
46 |
+
device_map='auto',
|
47 |
+
)
|
48 |
+
model.to(dtype=torch.bfloat16)
|
49 |
|
50 |
# directory containing the images
|
51 |
image_directory = "./images"
|
images/00000-203229662.png
ADDED
Git LFS Details
|
images/00000-203229662.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
The image depicts a striking scene from the movie Black Swan. The ballerina, portrayed by Natalie Portman, is standing in a powerful, dramatic pose. Her body is elongated, with her arms outstretched to the sides, creating a sense of balance and tension. Her head is tilted back, gazing upwards, which adds to the intensity of the composition. The dancer's skin is pale, and her eyes are wide open, likely a deep blue or green, though the exact color is difficult to discern in this still. Her hair is dark, possibly black or dark brown, and appears to be styled in an elegant updo, though the exact style is not clear from this angle. The ballerina's body type is slender and athletic, reflecting her profession as a dancer. She is wearing a black leotard, which contrasts sharply with her pale skin and dark hair. The lighting in the scene is dramatic, with shadows playing across her face and body, emphasizing the intensity of the moment. This image captures the essence of the film's themes of obsession, dedication, and the psychological toll of pursuing perfection in dance.
|
images/00006-2234503665.png
ADDED
Git LFS Details
|
images/00006-2234503665.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
The image depicts a striking scene from the movie Black Swan. The ballerina, portrayed by Natalie Portman, is standing in a powerful, dramatic pose. Her body is elongated, with her arms outstretched to the sides, creating a sense of balance and tension. Her head is tilted back, gazing upwards, which adds to the intensity of the composition. The dancer's skin is pale, and her eyes are wide open, likely a deep blue or green, though the exact color is difficult to discern in this still. Her hair is dark, possibly black or dark brown, and appears to be styled in an elegant updo, though the exact style is not clear from this angle. The ballerina's body type is slender and athletic, reflecting her profession as a dancer. She is wearing a black leotard, which contrasts sharply with her pale skin and dark hair. The lighting in the scene is dramatic, with shadows playing across her face and body, emphasizing the intensity of the moment. This image captures the essence of the film's themes of obsession, dedication, and the psychological toll of pursuing perfection in dance.
|
images/00030-4075734474.png
ADDED
Git LFS Details
|
images/00030-4075734474.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
The image depicts a striking scene from the movie Black Swan. The ballerina, portrayed by Natalie Portman, is standing in a powerful, dramatic pose. Her body is elongated, with her arms outstretched to the sides, creating a sense of balance and tension. Her head is tilted back, gazing upwards, which adds to the intensity of the composition. The dancer's skin is pale, and her eyes are wide open, likely a deep blue or green, though the exact color is difficult to discern in this still. Her hair is dark, possibly black or dark brown, and appears to be styled in an elegant updo, though the exact style is not clear from this angle. The ballerina's body type is slender and athletic, reflecting her profession as a dancer. She is wearing a black leotard, which contrasts sharply with her pale skin and dark hair. The lighting in the scene is dramatic, with shadows playing across her face and body, emphasizing the intensity of the moment. This image captures the essence of the film's themes of obsession, dedication, and the psychological toll of pursuing perfection in dance.
|
model/molmo-7B-D-bnb-4bit
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
Subproject commit 51097c4251a023d72485963c1ab69f3b6d6a1ec6
|