--- base_model: unsloth/qwen2-0.5b-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen2 - trl - sft license: apache-2.0 language: - en - ko datasets: - prismdata/KDI-DATASET-2014 metrics: - accuracy --- # Uploaded model - **Developed by:** prismdata - **License:** apache-2.0 - **Finetuned from model :** unsloth/qwen2-0.5b-bnb-4bit This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) # It does not needs too much big memory # Inference sample ``` from transformers import AutoTokenizer from transformers import AutoModelForCausalLM import time device = "cpu" model = AutoModelForCausalLM.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map = device) tokenizer = AutoTokenizer.from_pretrained("prismdata/KDI-Qwen2-instruction-0.5B",cache_dir="./", device_map =device) prompt_template = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: {prompt}\nAssistant:\n""" text = 'Centrelink가 뭐야?' model_inputs = tokenizer(prompt_template.format(prompt=text), return_tensors='pt').to(device) start = time.time() outputs = model.generate(**model_inputs, max_new_tokens=256).to(device) output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0] print(output_text) end = time.time() print(f"{end - start:.5f} sec") ```