File size: 2,235 Bytes
f426299
a8a63e0
 
 
f426299
a8a63e0
 
 
 
 
 
 
 
f426299
a8a63e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---

language:
- ko
- en
license: apache-2.0
tags:
- text-generation
- qwen2.5
- korean
- instruct
- mlx
- 8bit
pipeline_tag: text-generation
---


## Qwen2.5-7B-Instruct-kowiki-qa-8bit mlx convert model
- Original model is [beomi/Qwen2.5-7B-Instruct-kowiki-qa](https://huggingface.co/beomi/Qwen2.5-7B-Instruct-kowiki-qa)


## Requirement
- `pip install mlx-lm`

## Usage
- [Generate with CLI](https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md#command-line)
    ```bash

    mlx_lm.generate --model sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit --prompt "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"

    ```


- [In Python](https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md#python-api)
    ```python

    from mlx_lm import load, generate

    

    model, tokenizer = load(

        "sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit",

        tokenizer_config={"trust_remote_code": True},

    )


    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"

    

    messages = [

        {"role": "system", "content": "당신은 μΉœμ² ν•œ μ±—λ΄‡μž…λ‹ˆλ‹€."},

        {"role": "user", "content": prompt},

    ]

    prompt = tokenizer.apply_chat_template(

        messages,

        tokenize=False,

        add_generation_prompt=True,

    )

    

    text = generate(

        model,

        tokenizer,

        prompt=prompt,

        # verbose=True,

        # max_tokens=8196,

        # temp=0.0,

    )

    ```


- [OpenAI Compitable HTTP Server](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/SERVER.md)
    ```bash

    mlx_lm.server --model sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit --host 0.0.0.0

    ```


    ```python

    import openai



    client = openai.OpenAI(

        base_url="http://localhost:8080/v1",

    )


    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"


    messages = [

        {"role": "system", "content": "당신은 μΉœμ ˆν•œ μ±—λ΄‡μž…λ‹ˆλ‹€.",},

        {"role": "user", "content": prompt},

    ]

    res = client.chat.completions.create(

        model='sucream/Qwen2.5-7B-Instruct-kowiki-qa-8bit',

        messages=messages,

        temperature=0.2,

    )

    

    print(res.choices[0].message.content)

    ```