Copycats commited on
Commit
9fc0287
β€’
1 Parent(s): 72fe9fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -1
README.md CHANGED
@@ -1,3 +1,88 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: yanolja/EEVE-Korean-Instruct-10.8B-v1.0
3
+ inference: false
4
+ language:
5
+ - ko
6
+ library_name: transformers
7
+ license: cc-by-nc-4.0
8
+ pipeline_tag: text-generation
9
  ---
10
+
11
+ # Synatra-kiqu-10.7b-awq
12
+ - Model creator: [Yanolja](https://huggingface.co/yanolja)
13
+ - Original model: [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0)
14
+
15
+ <!-- description start -->
16
+ ## Description
17
+
18
+ This repo contains AWQ model files for [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0).
19
+
20
+
21
+ ### About AWQ
22
+
23
+ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
24
+
25
+ It is supported by:
26
+
27
+ - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
28
+ - [vLLM](https://github.com/vllm-project/vllm) - Llama and Mistral models only
29
+ - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
30
+ - [Transformers](https://huggingface.co/docs/transformers) version 4.35.0 and later, from any code or client that supports Transformers
31
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) - for use from Python code
32
+
33
+ <!-- description end -->
34
+
35
+ <!-- README_AWQ.md-use-from-vllm start -->
36
+ ## Using OpenAI Chat API with vLLM
37
+
38
+ Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
39
+
40
+ - Please ensure you are using vLLM version 0.2 or later.
41
+ - When using vLLM as a server, pass the `--quantization awq` parameter.
42
+
43
+ #### Start the OpenAI-Compatible Server:
44
+ - vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
45
+
46
+ ```shell
47
+ python3 -m vllm.entrypoints.openai.api_server --model Copycats/EEVE-Korean-Instruct-10.8B-v1.0-AWQ --quantization awq --dtype auto
48
+ ```
49
+
50
+ #### Querying the model using OpenAI Chat API:
51
+ - You can use the create chat completion endpoint to communicate with the model in a chat-like interface:
52
+
53
+ ```shell
54
+ curl http://localhost:8000/v1/chat/completions \
55
+ -H "Content-Type: application/json" \
56
+ -d '{
57
+ "model": "Copycats/EEVE-Korean-Instruct-10.8B-v1.0-AWQ",
58
+ "messages": [
59
+ {"role": "system", "content": "당신은 μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— μΉœμ ˆν•˜κ²Œ λ‹΅λ³€ν•˜λŠ” μ–΄μ‹œμŠ€ν„΄νŠΈμž…λ‹ˆλ‹€."},
60
+ {"role": "user", "content": "괜슀레 μŠ¬νΌμ„œ 눈물이 λ‚˜λ©΄ μ–΄λ–»κ²Œ ν•˜λ‚˜μš”?"}
61
+ ]
62
+ }'
63
+ ```
64
+
65
+ #### Python Client Example:
66
+ - Using the openai python package, you can also communicate with the model in a chat-like manner:
67
+
68
+ ```python
69
+ from openai import OpenAI
70
+ # Set OpenAI's API key and API base to use vLLM's API server.
71
+ openai_api_key = "EMPTY"
72
+ openai_api_base = "http://localhost:8000/v1"
73
+
74
+ client = OpenAI(
75
+ api_key=openai_api_key,
76
+ base_url=openai_api_base,
77
+ )
78
+
79
+ chat_response = client.chat.completions.create(
80
+ model="Copycats/EEVE-Korean-Instruct-10.8B-v1.0-AWQ",
81
+ messages=[
82
+ {"role": "system", "content": "당신은 μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— μΉœμ ˆν•˜κ²Œ λ‹΅λ³€ν•˜λŠ” μ–΄μ‹œμŠ€ν„΄νŠΈμž…λ‹ˆλ‹€."},
83
+ {"role": "user", "content": "괜슀레 μŠ¬νΌμ„œ 눈물이 λ‚˜λ©΄ μ–΄λ–»κ²Œ ν•˜λ‚˜μš”?"},
84
+ ]
85
+ )
86
+ print("Chat response:", chat_response)
87
+ ```
88
+ <!-- README_AWQ.md-use-from-vllm start -->