metadata
language:
- ko
- en
license: apache-2.0
tags:
- text-generation
- qwen2.5
- korean
- instruct
- mlx
- 8bit
pipeline_tag: text-generation
Qwen2.5-7B-Instruct-kowiki-qa-8bit mlx convert model
- Original model is beomi/Qwen2.5-7B-Instruct-kowiki-qa
Requirement
pip install mlx-lm
Usage
-
mlx_lm.generate --model sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit --prompt "νλμ΄ νλ μ΄μ κ° λμΌ?"
-
from mlx_lm import load, generate model, tokenizer = load( "sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit", tokenizer_config={"trust_remote_code": True}, ) prompt = "νλμ΄ νλ μ΄μ κ° λμΌ?" messages = [ {"role": "system", "content": "λΉμ μ μΉμ² ν μ±λ΄μ λλ€."}, {"role": "user", "content": prompt}, ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) text = generate( model, tokenizer, prompt=prompt, # verbose=True, # max_tokens=8196, # temp=0.0, )
-
mlx_lm.server --model sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit --host 0.0.0.0
import openai client = openai.OpenAI( base_url="http://localhost:8080/v1", ) prompt = "νλμ΄ νλ μ΄μ κ° λμΌ?" messages = [ {"role": "system", "content": "λΉμ μ μΉμ ν μ±λ΄μ λλ€.",}, {"role": "user", "content": prompt}, ] res = client.chat.completions.create( model='sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit', messages=messages, temperature=0.2, ) print(res.choices[0].message.content)