microsoft/Florence-2-base-ft · Inference on CPU

Jul 1

How can the model be run on CPU considering, flash_attn doesn't support cpu

Jul 2

Remove import flash_attn

from unittest.mock import patch
from transformers.dynamic_module_utils import get_imports

def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
    if not str(filename).endswith("modeling_florence2.py"):
        return get_imports(filename)
    imports = get_imports(filename)
    imports.remove("flash_attn")
    return imports

Use attn_implementation="sdpa" when load model:

 with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): #workaround for unnecessary flash_attn requirement
            model = AutoModelForCausalLM.from_pretrained(model_path, attn_implementation="sdpa", torch_dtype=dtype,trust_remote_code=True)

With this, you can inference on CPU or GPU doesn't support flash_attn.

wambugu71

Jul 2

Thanks, it works now.

wambugu71 changed discussion status to closed Jul 2

JiabaoWangTS

Jul 3

Thanks, it works now.

carion99

Sep 29

thanks, It works now.