prismdata's picture
Update README.md
026d451 verified
metadata
base_model: unsloth/qwen2-0.5b-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2
  - trl
  - sft
license: apache-2.0
language:
  - en
  - ko
datasets:
  - prismdata/KDI-DATASET-2014
metrics:
  - accuracy

Uploaded model

  • Developed by: prismdata
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2-0.5b-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

It does not needs too much big memory

Inference sample

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
import time

device = "cpu"
model = AutoModelForCausalLM.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map = device)
tokenizer = AutoTokenizer.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map =device)

prompt_template = """A chat between a curious user and an artificial intelligence assistant. 
The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n"""
text = 'Centrelink가 뭐야?'
model_inputs = tokenizer(prompt_template.format(prompt=text), return_tensors='pt').to(device)

start = time.time()
outputs = model.generate(**model_inputs, max_new_tokens=256).to(device)
output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(output_text)
end = time.time()
print(f"{end - start:.5f} sec")