how to run CPU mode in AWQ?

by joyUniverse - opened Dec 12, 2024

Discussion

joyUniverse

Dec 12, 2024

This comment has been hidden

joyUniverse

Dec 12, 2024

•

edited Dec 12, 2024

I tried to test AWQ model by using Quickstart manual with CPU mode. But The model wasn't generated.
How to run CPU mode in AWQ?
I have a error log : NameError: name 'flash_attn_func' is not defined

Could you help me with this?

Thank you.

######################################

env

autoawq 0.2.7.post3
transformers 4.46.3
intel_extension_for_pytorch 2.5.0
######################################

test code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from awq import AutoAWQForCausalLM
from awq.utils.utils import get_best_device

device = get_best_device()
model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct-AWQ"

model = AutoAWQForCausalLM.from_quantized(
model_name,
use_ipex = True,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Choose your prompt

prompt = "Explain how wonderful you are" # English example
prompt = "스스로를 자랑해 봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cpu"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))

joyUniverse changed discussion status to closed Dec 12, 2024

joyUniverse changed discussion status to open Dec 12, 2024

joyUniverse changed discussion status to closed Dec 12, 2024

joyUniverse changed discussion status to open Dec 12, 2024

joyUniverse changed discussion status to closed Dec 12, 2024

joyUniverse changed discussion status to open Dec 12, 2024

joyUniverse changed discussion status to closed Dec 12, 2024

joyUniverse changed discussion status to open Dec 12, 2024

LG-AI-EXAONE

LG AI Research org Dec 13, 2024

Hello @joyUniverse , We apologize for the delayed response.

Let me guide you through a few points:

When using CPU, you need to remove device_map="auto" from your code.
There's no need to move input_ids to cpu, as it's already in CPU memory (RAM).

After implementing these changes, the original error might be resolved, though other unexpected issues may arise.
If you encounter any new errors, please share them with us so we can help resolve the issues more efficiently.

For CPU inference, the AutoAWQ documentation suggests installing the required dependencies using:

pip install autoawq[cpu]

Please note that AWQ was primarily designed for GPU inference, and we haven't thoroughly tested it in CPU environments yet.
We recommend trying the code modifications suggested above and referring to the AutoAWQ documentation. We'll update you once our testing is complete.

Thank you for your patience and understanding.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment