---
base_model: unsloth/qwen2-0.5b-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
- sft
license: apache-2.0
language:
- en
- ko
datasets:
- prismdata/KDI-DATASET-2014
metrics:
- accuracy
---

# Uploaded  model

- **Developed by:** prismdata
- **License:** apache-2.0
- **Finetuned from model :** unsloth/qwen2-0.5b-bnb-4bit

This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

# It does not needs too much big memory

# Inference sample
```
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
import time

device = "cpu"
model = AutoModelForCausalLM.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map = device)
tokenizer = AutoTokenizer.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map =device)

prompt_template = """A chat between a curious user and an artificial intelligence assistant. 
The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n"""
text = 'Centrelink가 뭐야?'
model_inputs = tokenizer(prompt_template.format(prompt=text), return_tensors='pt').to(device)

start = time.time()
outputs = model.generate(**model_inputs, max_new_tokens=256).to(device)
output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(output_text)
end = time.time()
print(f"{end - start:.5f} sec")
```