Model Card for Model ID
Model Details
Model Description
- Developed by: hack337
- Model type: qwen2
- Finetuned from model: Qwen/Qwen2-1.5B-Instruct
Model Sources [optional]
- Repository: https://huggingface.co/Hack337/WavGPT-1.0
- Demo: https://huggingface.co/spaces/Hack337/WavGPT
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"Hack337/WavGPT-1.0-merged",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "Вы очень полезный помощник."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Use the code below to get started with the model using NPU.
from transformers import AutoTokenizer, TextStreamer
from intel_npu_acceleration_library import NPUModelForCausalLM
import torch
# Load the NPU-optimized model without LoRA
model = NPUModelForCausalLM.from_pretrained(
"Hack337/WavGPT-1.0-merged",
use_cache=True,
dtype=torch.float16 # Use float16 for the NPU
).eval()
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged")
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)
# Prompt handling
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "Вы очень полезный помощник."},
{"role": "user", "content": prompt}
]
# Convert to a text format compatible with the model
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")
# Generation configuration
generation_kwargs = dict(
input_ids=prefix,
streamer=streamer,
do_sample=True,
top_k=50,
top_p=0.9,
max_new_tokens=512,
)
# Run inference on the NPU
print("Run inference")
_ = model.generate(**generation_kwargs)
- PEFT 0.11.1
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Hack337/WavGPT-1.0-merged
Base model
Qwen/Qwen2-1.5B-Instruct