Uploaded model
- Developed by: prismdata
- License: apache-2.0
- Finetuned from model : unsloth/qwen2-0.5b-bnb-4bit
This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
It does not needs too much big memory
Inference sample
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
import time
device = "cpu"
model = AutoModelForCausalLM.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map = device)
tokenizer = AutoTokenizer.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map =device)
prompt_template = """A chat between a curious user and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n"""
text = 'Centrelink가 뭐야?'
model_inputs = tokenizer(prompt_template.format(prompt=text), return_tensors='pt').to(device)
start = time.time()
outputs = model.generate(**model_inputs, max_new_tokens=256).to(device)
output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(output_text)
end = time.time()
print(f"{end - start:.5f} sec")
- Downloads last month
- 94
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.