Utilize Apple M1 chip causes error (kernel death)
I don't have Nvidia GPU, so tried to use M1 on my Macbook air. However, executing the code below leads to kernel death.
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline
model_id = "CompVis/stable-diffusion-v1-4"
device = "mps"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision="fp16", use_auth_token=True)
pipe = pipe.to(device)
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt, guidance_scale=7.5)["sample"][0]
Note that pipe.to(device)
executes successfully.
Have anyone made M1 work yet? My pytorch version is '1.13.0.dev20220823'
We're working on exactly this! Pinging @pcuenq and @apolinario here as well
Please also check announcements on Twitter - we'll publish something about that soon!
Thanks!
In my case on Apple M1 with the code
# make sure you're logged in with `huggingface-cli login`
import os
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
# To swap out the noise scheduler, pass it to from_pretrained:
lms = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear"
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'running on {device}')
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-3",
scheduler=lms,
torch_dtype=torch.float16,
revision="fp16",
use_auth_token=True,
cache_dir=os.getenv("cache_dir", "./models")
).to(device)
prompt = "a photo of an astronaut riding a horse on mars"
with autocast(device):
image = pipe(prompt)["sample"][0]
image.save("astronaut_rides_horse.png")
I get the following error
Traceback (most recent call last):
File "diffuser.py", line 27, in <module>
image = pipe(prompt)["sample"][0]
...
File "/Documents/Projects/bloom/.venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2503, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
For device mps
it doesn't work out of the box yet indeed, however if device = cpu
it should work
@loretoparisi
. Can you try removing the with autocast(device)
? Autocast doesn't work for CPU as of now
For device
mps
it doesn't work out of the box yet indeed, however if device =cpu
it should work @loretoparisi . Can you try removing thewith autocast(device)
? Autocast doesn't work for CPU as of now
Thanks, I slightly modified the code like
prompt = "a photo of an astronaut riding a horse on mars"
samples = 2
steps = 45
scale = 7.5
if device=='cuda':
with autocast(device):
image = pipe(
[prompt]*samples,
num_inference_steps=steps,
guidance_scale=scale,
)["sample"][0]
else:
image = pipe(prompt)["sample"][0]
but I'm still getting the same error:
Traceback (most recent call last):
File "diffuser.py", line 39, in <module>
image = pipe(prompt)["sample"][0]
File "/Projects/bloom/.venv/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
...
File "/Projects/bloom/.venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2503, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
@loretoparisi
, oh this is probably because you are trying to load the fp16 version of the model - which also doesn't work on CPU 😅
Try this for pipe
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", #better model btw
scheduler=lms,
use_auth_token=True,
cache_dir=os.getenv("cache_dir", "./models")
).to(device)
Thank you it works without on Apple M1, removing autocast and fp16!
Here is the code for other people's convenience
# make sure you're logged in with `huggingface-cli login`
import os
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
# To swap out the noise scheduler, pass it to from_pretrained:
lms = LMSDiscreteScheduler(
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear"
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'running on {device}')
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", #better model btw
scheduler=lms,
use_auth_token=True,
cache_dir=os.getenv("cache_dir", "./models")
).to(device)
prompt = "a photo of an astronaut riding a horse on mars"
samples = 2
steps = 45
scale = 7.5
image = pipe(prompt)["sample"][0]
it definitely works, very slowly though.
On an M1 (not M1 Max) I get TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
if I don't specify revision and torch_dtype. The script python scripts/txt2img.py
works to create images though, so it's an issue with diffusers
and not stable-diffusion I think.
On an M1 (not M1 Max) I get
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
if I don't specify revision and torch_dtype. The scriptpython scripts/txt2img.py
works to create images though, so it's an issue withdiffusers
and not stable-diffusion I think.
Can you say how you specify revision and torch_dtype?
I think txt2image.py is using CPU if Cuda is not available
To sgt101, I think you're running in CPU mode, because of the line that says device = 'cuda' if torch.cuda.is_available() else 'cpu'
I'm pretty sure my txt2image
is using MPS (magnusviri's fork) because it takes about a minute to run instead of upwards of 30 mins, among other things. I have
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16, revision="fp16",
use_auth_token=True,
).to("mps")
But that errors out with
0it [00:00, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Abort trap: 6
/Users/fragmede/miniforge3/envs/ldm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Figured it out! I opened a PR so hugging face can get my fix to diffusers to get MPS to work.
It works after upgrading diffusers.
pip install -U diffusers
torch_dtype=torch.float16
remove this, and it works for me
Hey !
I have the same problem :
loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<1x77x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
But I don't understand how to fix it ... I'm an architect and I don't have skills with coding ... Can you please develop a bit the method (if there is any) ?
Thanks a lot in advance !