NameError: name 'flash_attn_func' is not defined

#4
by Neman - opened

I have flash-attn installed (v 2.5.2), but I get:
Exception has occurred: NameError
name 'flash_attn_func' is not defined
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 65, in
_flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
File "/home/neman/PROGRAMMING/PYTHON/DuckDuckGo_Search/AQLM_test1.py", line 3, in
quantized_model = AutoModelForCausalLM.from_pretrained(
NameError: name 'flash_attn_func' is not defined

toy code:
from transformers import AutoTokenizer, AutoModelForCausalLM
quantized_model = AutoModelForCausalLM.from_pretrained(
"/mnt/disk2/LLM_MODELS/models/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf",
torch_dtype="auto", device_map="auto", low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-v0.1")
output = quantized_model.generate(tokenizer("", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)
output = quantized_model.generate(tokenizer("I'm AQLM, ", return_tensors="pt")["input_ids"].cuda(), min_new_tokens=128, max_new_tokens=128)
print(tokenizer.decode(output[0]))

Any ideas?

UPDATE:
I checked and saw that from few days ago there is flash-attn 2.5.3 so I updated. Now I get different error:
Exception has occurred: RuntimeError
Only Tensors of floating point and complex dtype can require gradients
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 308, in init
self.q_proj = QuantizedLinear(self.hidden_size, self.num_heads * self.head_dim, bias=False, **config.aqlm)
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 889, in init
self.self_attn = MIXTRAL_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1093, in
[MixtralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1093, in init
[MixtralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1277, in init
self.model = MixtralModel(config)
File "/home/neman/PROGRAMMING/PYTHON/DuckDuckGo_Search/AQLM_test1.py", line 3, in
quantized_model = AutoModelForCausalLM.from_pretrained(
RuntimeError: Only Tensors of floating point and complex dtype can require gradients

IST Austria Distributed Algorithms and Systems Lab org

Installing the latest accelerate will fix the second error.

It did. Thank you for support.

BlackSamorez changed discussion status to closed

Sign up or log in to comment