Yarik commited on
Commit
348823d
1 Parent(s): 083d21c

Update space

Browse files
Files changed (1) hide show
  1. README.md +155 -1
README.md CHANGED
@@ -8,4 +8,158 @@ pinned: false
8
  license: apache-2.0
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  license: apache-2.0
9
  ---
10
 
11
+ ## HF-LLM-API
12
+ Huggingface LLM Inference API in OpenAI message format.
13
+
14
+
15
+
16
+ ## Features
17
+
18
+ - Available Models (2024/01/22):
19
+ - `mistral-7b`, `mixtral-8x7b`, `nous-mixtral-8x7b`
20
+ - Adaptive prompt templates for different models
21
+ - Support OpenAI API format
22
+ - Enable api endpoint via official `openai-python` package
23
+ - Support both stream and no-stream response
24
+ - Support API Key via both HTTP auth header and env varible
25
+ - Docker deployment
26
+
27
+ ## Run API service
28
+
29
+ ### Run in Command Line
30
+
31
+ **Install dependencies:**
32
+
33
+ ```bash
34
+ # pipreqs . --force --mode no-pin
35
+ pip install -r requirements.txt
36
+ ```
37
+
38
+ **Run API:**
39
+
40
+ ```bash
41
+ python -m apis.chat_api
42
+ ```
43
+
44
+ ## Run via Docker
45
+
46
+ **Docker build:**
47
+
48
+ ```bash
49
+ sudo docker build -t hf-llm-api:1.0 . --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy
50
+ ```
51
+
52
+ **Docker run:**
53
+
54
+ ```bash
55
+ # no proxy
56
+ sudo docker run -p 23333:23333 hf-llm-api:1.0
57
+
58
+ # with proxy
59
+ sudo docker run -p 23333:23333 --env http_proxy="http://<server>:<port>" hf-llm-api:1.0
60
+ ```
61
+
62
+ ## API Usage
63
+
64
+ ### Using `openai-python`
65
+
66
+ See: [`examples/chat_with_openai.py`](https://github.com/ruslanmv/hf-llm-api-collection/blob/main/examples/chat_with_openai.py)
67
+
68
+ ```py
69
+ from openai import OpenAI
70
+
71
+ # If runnning this service with proxy, you might need to unset `http(s)_proxy`.
72
+ base_url = "http://127.0.0.1:23333"
73
+ # Your own HF_TOKEN
74
+ api_key = "hf_xxxxxxxxxxxxxxxx"
75
+ # use below as non-auth user
76
+ # api_key = "sk-xxx"
77
+
78
+ client = OpenAI(base_url=base_url, api_key=api_key)
79
+ response = client.chat.completions.create(
80
+ model="mixtral-8x7b",
81
+ messages=[
82
+ {
83
+ "role": "user",
84
+ "content": "what is your model",
85
+ }
86
+ ],
87
+ stream=True,
88
+ )
89
+
90
+ for chunk in response:
91
+ if chunk.choices[0].delta.content is not None:
92
+ print(chunk.choices[0].delta.content, end="", flush=True)
93
+ elif chunk.choices[0].finish_reason == "stop":
94
+ print()
95
+ else:
96
+ pass
97
+ ```
98
+
99
+ ### Using post requests
100
+
101
+ See: [`examples/chat_with_post.py`](https://github.com/ruslanmv/hf-llm-api-collection/blob/main/examples/chat_with_post.py)
102
+
103
+
104
+ ```py
105
+ import ast
106
+ import httpx
107
+ import json
108
+ import re
109
+
110
+ # If runnning this service with proxy, you might need to unset `http(s)_proxy`.
111
+ chat_api = "http://127.0.0.1:23333"
112
+ # Your own HF_TOKEN
113
+ api_key = "hf_xxxxxxxxxxxxxxxx"
114
+ # use below as non-auth user
115
+ # api_key = "sk-xxx"
116
+
117
+ requests_headers = {}
118
+ requests_payload = {
119
+ "model": "mixtral-8x7b",
120
+ "messages": [
121
+ {
122
+ "role": "user",
123
+ "content": "what is your model",
124
+ }
125
+ ],
126
+ "stream": True,
127
+ }
128
+
129
+ with httpx.stream(
130
+ "POST",
131
+ chat_api + "/chat/completions",
132
+ headers=requests_headers,
133
+ json=requests_payload,
134
+ timeout=httpx.Timeout(connect=20, read=60, write=20, pool=None),
135
+ ) as response:
136
+ # https://docs.aiohttp.org/en/stable/streams.html
137
+ # https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb
138
+ response_content = ""
139
+ for line in response.iter_lines():
140
+ remove_patterns = [r"^\s*data:\s*", r"^\s*\[DONE\]\s*"]
141
+ for pattern in remove_patterns:
142
+ line = re.sub(pattern, "", line).strip()
143
+
144
+ if line:
145
+ try:
146
+ line_data = json.loads(line)
147
+ except Exception as e:
148
+ try:
149
+ line_data = ast.literal_eval(line)
150
+ except:
151
+ print(f"Error: {line}")
152
+ raise e
153
+ # print(f"line: {line_data}")
154
+ delta_data = line_data["choices"][0]["delta"]
155
+ finish_reason = line_data["choices"][0]["finish_reason"]
156
+ if "role" in delta_data:
157
+ role = delta_data["role"]
158
+ if "content" in delta_data:
159
+ delta_content = delta_data["content"]
160
+ response_content += delta_content
161
+ print(delta_content, end="", flush=True)
162
+ if finish_reason == "stop":
163
+ print()
164
+
165
+ ```