Inference on CPU
#13
by
wambugu71
- opened
How can the model be run on CPU considering, flash_attn
doesn't support cpu
Remove import flash_attn
from unittest.mock import patch
from transformers.dynamic_module_utils import get_imports
def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
if not str(filename).endswith("modeling_florence2.py"):
return get_imports(filename)
imports = get_imports(filename)
imports.remove("flash_attn")
return imports
Use attn_implementation="sdpa" when load model:
with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): #workaround for unnecessary flash_attn requirement
model = AutoModelForCausalLM.from_pretrained(model_path, attn_implementation="sdpa", torch_dtype=dtype,trust_remote_code=True)
With this, you can inference on CPU or GPU doesn't support flash_attn.
Thanks, it works now.
wambugu71
changed discussion status to
closed
Thanks, it works now.
thanks, It works now.