Qwen2.5-7B-Instruct-kowiki-qa-8bit mlx convert model

Requirement

  • pip install mlx-lm

Usage

  • Generate with CLI

    mlx_lm.generate --model mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-8bit --prompt "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
  • In Python

    from mlx_lm import load, generate
    
    model, tokenizer = load(
        "mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-8bit",
        tokenizer_config={"trust_remote_code": True},
    )
    
    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
    messages = [
        {"role": "system", "content": "당신은 μΉœμ² ν•œ μ±—λ΄‡μž…λ‹ˆλ‹€."},
        {"role": "user", "content": prompt},
    ]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    
    text = generate(
        model,
        tokenizer,
        prompt=prompt,
        # verbose=True,
        # max_tokens=8196,
        # temp=0.0,
    )
    
  • OpenAI Compitable HTTP Server

    mlx_lm.server --model mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-8bit --host 0.0.0.0
    
    import openai
    
    
    client = openai.OpenAI(
        base_url="http://localhost:8080/v1",
    )
    
    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
    messages = [
        {"role": "system", "content": "당신은 μΉœμ ˆν•œ μ±—λ΄‡μž…λ‹ˆλ‹€.",},
        {"role": "user", "content": prompt},
    ]
    res = client.chat.completions.create(
        model='mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-8bit',
        messages=messages,
        temperature=0.2,
    )
    
    print(res.choices[0].message.content)
    
Downloads last month
3
Safetensors
Model size
2.14B params
Tensor type
FP16
Β·
U32
Β·
Inference Examples
Inference API (serverless) does not yet support mlx models for this pipeline type.