sucream's picture
init
a8a63e0
|
raw
history blame
2.24 kB
metadata
language:
  - ko
  - en
license: apache-2.0
tags:
  - text-generation
  - qwen2.5
  - korean
  - instruct
  - mlx
  - 8bit
pipeline_tag: text-generation

Qwen2.5-7B-Instruct-kowiki-qa-8bit mlx convert model

Requirement

  • pip install mlx-lm

Usage

  • Generate with CLI

    mlx_lm.generate --model sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit --prompt "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
  • In Python

    from mlx_lm import load, generate
    
    model, tokenizer = load(
        "sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit",
        tokenizer_config={"trust_remote_code": True},
    )
    
    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
    messages = [
        {"role": "system", "content": "당신은 μΉœμ² ν•œ μ±—λ΄‡μž…λ‹ˆλ‹€."},
        {"role": "user", "content": prompt},
    ]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    
    text = generate(
        model,
        tokenizer,
        prompt=prompt,
        # verbose=True,
        # max_tokens=8196,
        # temp=0.0,
    )
    
  • OpenAI Compitable HTTP Server

    mlx_lm.server --model sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit --host 0.0.0.0
    
    import openai
    
    
    client = openai.OpenAI(
        base_url="http://localhost:8080/v1",
    )
    
    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
    messages = [
        {"role": "system", "content": "당신은 μΉœμ ˆν•œ μ±—λ΄‡μž…λ‹ˆλ‹€.",},
        {"role": "user", "content": prompt},
    ]
    res = client.chat.completions.create(
        model='sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit',
        messages=messages,
        temperature=0.2,
    )
    
    print(res.choices[0].message.content)