imone commited on
Commit
af0f751
1 Parent(s): 0e07cf3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -155
README.md CHANGED
@@ -38,39 +38,21 @@ pipeline_tag: text-generation
38
  <img src="https://styles.redditmedia.com/t5_6075m3/styles/profileIcon_71syco7c5lt81.png?width=256&height=256&frame=1&auto=webp&crop=256:256,smart&s=24bd3c71dc11edc5d4f88d0cbc1da72ed7ae1969" alt="RunPod Logo" style="width:30px; vertical-align: middle; display: inline-block; margin-right: 5px; margin-left: 5px; margin-top: 0px; margin-bottom: 0px;"/>
39
  </p>
40
 
41
- <div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center; ont-size: 0.5em; border: 0.8em solid #864AF9;">
42
  <a href="https://huggingface.co/openchat/openchat-3.5-0106" style="text-decoration: none; color: black;">
43
- <span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db; ">3.5</span>
44
- <span style="font-size: 1.0em; font-family: 'Helvetica'; color: white; background-color: #864AF9; vertical-align: top; border-radius: 6em; padding: 0.066em 0.4em; letter-spacing: 0.1em; font-weight: bold;">0106</span>
45
  <span style="font-size: 0.85em; font-family: 'Helvetica'; color: black;">
46
- <br> 🏆 The Overall Best Performing Open Source 7B Model 🏆
47
- <br> 🤖 Outperforms <span style="font-weight: bold;">ChatGPT</span> (March) and <span style="font-weight: bold;">Grok-1</span> 🤖
48
- <br> 🚀<span style="font-size: 1em; font-family: 'Helvetica'; color: black; font-weight: bold;">15</span>-point improvement in Coding over <span style="font-size: 0.9em;
49
- font-family: 'Helvetica'; color: black; font-weight: bold;">OpenChat-3.5🚀</span>
50
- <br><br><span style="font-size: 1em; font-family: 'Helvetica'; color: #3c72db; font-weight: bold;">New Features</span>
51
- <br> 💡 2 Modes: Coding + Generalist, Mathematical Reasoning 💡
52
- <br> 🧑‍⚖️ Experimental support for Evaluator and Feedback capabilities 🧑‍⚖️
53
  </span>
54
  </a>
55
  </div>
56
 
57
  <div style="display: flex; justify-content: center; align-items: center">
58
- <img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/openchat-bench-0106.png" style="width: 100%; border-radius: 1em">
59
  </div>
60
 
61
-
62
- <div>
63
- <h3> Table of Contents</h3>
64
- </div>
65
-
66
- 1. [Usage](#usage)
67
- 2. [Benchmarks](#benchmarks)
68
- 3. [Limitations](#limitations)
69
- 4. [License](#license)
70
- 6. [Citation](#citation)
71
- 7. [Acknowledgements](#acknowledgements)
72
-
73
-
74
  <div align="center">
75
  <h2> Usage </h2>
76
  </div>
@@ -81,56 +63,35 @@ Once started, the server listens at `localhost:18888` for requests and is compat
81
 
82
  If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
83
 
84
- | Model | Size | Context | Weights | Serving |
85
- |-------------------|------|---------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
86
- | OpenChat-3.5-0106 | 7B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat-3.5-0106) | `python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106 --engine-use-ray --worker-use-ray` |
87
 
88
  <details>
89
  <summary>Example request (click to expand)</summary>
90
 
91
- 💡 **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks
92
-
93
  ```bash
94
  curl http://localhost:18888/v1/chat/completions \
95
  -H "Content-Type: application/json" \
96
  -d '{
97
- "model": "openchat_3.5",
98
  "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
99
  }'
100
  ```
101
 
102
- 🧮 **Mathematical Reasoning Mode**: Tailored for solving math problems
103
-
104
- ```bash
105
- curl http://localhost:18888/v1/chat/completions \
106
- -H "Content-Type: application/json" \
107
- -d '{
108
- "model": "openchat_3.5",
109
- "condition": "Math Correct",
110
- "messages": [{"role": "user", "content": "10.3 − 7988.8133 = "}]
111
- }'
112
- ```
113
-
114
  </details>
115
 
116
  ### Conversation templates
117
 
118
- 💡 **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks
119
 
120
  ```
121
  GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
122
  ```
123
 
124
- 🧮 **Mathematical Reasoning Mode**: Tailored for solving math problems
125
-
126
- ```
127
- Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant:
128
- ```
129
-
130
  ⚠️ **Notice:** Remember to set `<|end_of_turn|>` as end of generation token.
131
 
132
- The default (GPT4 Correct) template is also available as the integrated `tokenizer.chat_template`,
133
- which can be used instead of manually specifying the template:
134
 
135
  ```python
136
  messages = [
@@ -139,98 +100,7 @@ messages = [
139
  {"role": "user", "content": "How are you today?"}
140
  ]
141
  tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
142
- assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
143
- ```
144
-
145
- <div align="center">
146
- <h2> (Experimental) Evaluator / Feedback Capabilities </h2>
147
- </div>
148
-
149
- We've included evaluator capabilities in this release to advance open-source models as evaluators. You can use `Default Mode (GPT4 Correct)` with the following prompt (same as [Prometheus](https://huggingface.co/datasets/kaist-ai/Feedback-Collection)) to evaluate a response.
150
-
151
  ```
152
- ###Task Description:
153
- An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
154
- 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
155
- 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
156
- 3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)"
157
- 4. Please do not generate any other opening, closing, and explanations.
158
-
159
- ###The instruction to evaluate:
160
- {orig_instruction}
161
-
162
- ###Response to evaluate:
163
- {orig_response}
164
-
165
- ###Reference Answer (Score 5):
166
- {orig_reference_answer}
167
-
168
- ###Score Rubrics:
169
- [{orig_criteria}]
170
- Score 1: {orig_score1_description}
171
- Score 2: {orig_score2_description}
172
- Score 3: {orig_score3_description}
173
- Score 4: {orig_score4_description}
174
- Score 5: {orig_score5_description}
175
-
176
- ###Feedback:
177
- ```
178
- <div align="center">
179
- <h2> Benchmarks </h2>
180
- </div>
181
-
182
- | Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
183
- |-----------------------|----------|----------|----------|-----------|----------|----------|------------|----------|----------|----------|
184
- | **OpenChat-3.5-0106** | **7B** | **64.5** | 7.8 | **71.3** | **51.5** | **49.1** | 61.0 | 65.8 | **77.4** | 62.2 |
185
- | OpenChat-3.5-1210 | **7B** | 63.8 | 7.76 | 68.9 | 49.5 | 48.0 | **61.8** | 65.3 | 77.3 | 61.8 |
186
- | OpenChat-3.5 | **7B** | 61.6 | 7.81 | 55.5 | 47.6 | 47.4 | 59.1 | 64.3 | 77.3 | 63.5 |
187
- | ChatGPT (March)* | ???B | 61.5 | **7.94** | 48.1 | 47.6 | 47.1 | 57.7 | **67.3** | 74.9 | **70.1** |
188
- | | | | | | | | | | | |
189
- | OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 |
190
- | OpenOrca Mistral | 7B | 52.7 | 6.86 | 38.4 | 49.4 | 42.9 | 45.9 | 59.3 | 59.1 | 58.1 |
191
- | Zephyr-β^ | 7B | 34.6 | 7.34 | 22.0 | 40.6 | 39.0 | 40.8 | 39.8 | 5.1 | 16.0 |
192
- | Mistral | 7B | - | 6.84 | 30.5 | 39.0 | 38.0 | - | 60.1 | 52.2 | - |
193
-
194
- <details>
195
- <summary>Evaluation Details(click to expand)</summary>
196
-
197
- *: ChatGPT (March) results are from [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774), [Chain-of-Thought Hub](https://github.com/FranxYao/chain-of-thought-hub), and our evaluation. Please note that ChatGPT is not a fixed baseline and evolves rapidly over time.
198
-
199
- ^: Zephyr-β often fails to follow few-shot CoT instructions, likely because it was aligned with only chat data but not trained on few-shot data.
200
-
201
- **: Mistral and Open-source SOTA results are taken from reported results in instruction-tuned model papers and official repositories.
202
-
203
- All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
204
-
205
-
206
- </details>
207
- <div>
208
- <h3>HumanEval+</h3>
209
- </div>
210
-
211
- | Model | Size | HumanEval+ pass@1 |
212
- |-----------------------------|--------|-------------------|
213
- | **OpenChat-3.5-0106** | **7B** | **65.9** |
214
- | ChatGPT (December 12, 2023) | ???B | 64.6 |
215
- | WizardCoder-Python-34B-V1.0 | 34B | 64.6 |
216
- | OpenChat 3.5 1210 | 7B | 63.4 |
217
- | OpenHermes 2.5 | 7B | 41.5 |
218
-
219
- <div>
220
- <h3>OpenChat-3.5 vs. Grok</h3>
221
- </div>
222
-
223
- 🔥 OpenChat-3.5-0106 (7B) now outperforms Grok-0 (33B) on **all 4 benchmarks** and Grok-1 (???B) on average and **3/4 benchmarks**.
224
-
225
- | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
226
- |-----------------------|-------------|---------|----------|--------|-----------|----------|----------|
227
- | **OpenChat-3.5-0106** | Apache-2.0 | **7B** | **61.0** | 65.8 | **71.3** | **29.3** | **77.4** |
228
- | OpenChat-3.5-1210 | Apache-2.0 | **7B** | 60.1 | 65.3 | 68.9 | 28.9 | 77.3 |
229
- | OpenChat-3.5 | Apache-2.0 | **7B** | 56.4 | 64.3 | 55.5 | 28.6 | 77.3 |
230
- | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
231
- | Grok-1 | Proprietary | ???B | 55.8 | **73** | 63.2 | 23.9 | 62.9 |
232
-
233
- *: Grok results are reported by [X.AI](https://x.ai/).
234
 
235
  <div align="center">
236
  <h2> Limitations </h2>
@@ -250,10 +120,14 @@ OpenChat may sometimes generate information that does not exist or is not accura
250
  OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
251
 
252
  <div align="center">
253
- <h2> License </h2>
254
  </div>
255
 
256
- Our OpenChat 3.5 code and models are distributed under the Apache License 2.0.
 
 
 
 
257
 
258
  <div align="center">
259
  <h2> Citation </h2>
@@ -266,14 +140,4 @@ Our OpenChat 3.5 code and models are distributed under the Apache License 2.0.
266
  journal={arXiv preprint arXiv:2309.11235},
267
  year={2023}
268
  }
269
- ```
270
-
271
- <div align="center">
272
- <h2> 💌 Contact </h2>
273
- </div>
274
-
275
- We look forward to hearing you and collaborating on this exciting project!
276
-
277
- **Project Lead:**
278
- - Guan Wang [imonenext at gmail dot com]
279
- - [Alpay Ariyak](https://github.com/alpayariyak) [aariyak at wpi dot edu]
 
38
  <img src="https://styles.redditmedia.com/t5_6075m3/styles/profileIcon_71syco7c5lt81.png?width=256&height=256&frame=1&auto=webp&crop=256:256,smart&s=24bd3c71dc11edc5d4f88d0cbc1da72ed7ae1969" alt="RunPod Logo" style="width:30px; vertical-align: middle; display: inline-block; margin-right: 5px; margin-left: 5px; margin-top: 0px; margin-bottom: 0px;"/>
39
  </p>
40
 
41
+ <div style="background-color: white; padding: 0.7em; border-radius: 0.5em; color: black; display: flex; flex-direction: column; justify-content: center; text-align: center">
42
  <a href="https://huggingface.co/openchat/openchat-3.5-0106" style="text-decoration: none; color: black;">
43
+ <span style="font-size: 1.7em; font-family: 'Helvetica'; letter-spacing: 0.1em; font-weight: bold; color: black;">Llama 3 Version: OPENCHAT</span><span style="font-size: 1.8em; font-family: 'Helvetica'; color: #3c72db; ">3.6</span>
44
+ <span style="font-size: 1.0em; font-family: 'Helvetica'; color: white; background-color: #03045e; vertical-align: top; border-radius: 6em; padding: 0.066em 0.4em; letter-spacing: 0.1em; font-weight: bold;">20240522</span>
45
  <span style="font-size: 0.85em; font-family: 'Helvetica'; color: black;">
46
+ <br> 🏆 The Overall Best Performing Open Source 8B Model 🏆
47
+ <br> 🚀 Outperforms Llama-3-8B-Instruct and open-source finetunes 🚀
 
 
 
 
 
48
  </span>
49
  </a>
50
  </div>
51
 
52
  <div style="display: flex; justify-content: center; align-items: center">
53
+ <img src="" style="width: 100%; border-radius: 1em">
54
  </div>
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  <div align="center">
57
  <h2> Usage </h2>
58
  </div>
 
63
 
64
  If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
65
 
66
+ | Model | Size | Context | Weights | Serving |
67
+ |-----------------------|------|---------|-------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
68
+ | OpenChat-3.6-20240522 | 8B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat-3.6-8b-20240522) | `python -m ochat.serving.openai_api_server --model openchat/openchat-3.6-8b-20240522` |
69
 
70
  <details>
71
  <summary>Example request (click to expand)</summary>
72
 
 
 
73
  ```bash
74
  curl http://localhost:18888/v1/chat/completions \
75
  -H "Content-Type: application/json" \
76
  -d '{
77
+ "model": "openchat_3.6",
78
  "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
79
  }'
80
  ```
81
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  </details>
83
 
84
  ### Conversation templates
85
 
86
+ 💡 **Default Mode**: Best for coding, chat and general tasks
87
 
88
  ```
89
  GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
90
  ```
91
 
 
 
 
 
 
 
92
  ⚠️ **Notice:** Remember to set `<|end_of_turn|>` as end of generation token.
93
 
94
+ The default template is also available as the integrated `tokenizer.chat_template`, which can be used instead of manually specifying the template:
 
95
 
96
  ```python
97
  messages = [
 
100
  {"role": "user", "content": "How are you today?"}
101
  ]
102
  tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
 
 
 
 
 
 
 
 
 
103
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
  <div align="center">
106
  <h2> Limitations </h2>
 
120
  OpenChat may sometimes generate harmful, hate speech, biased responses, or answer unsafe questions. It's crucial to apply additional AI safety measures in use cases that require safe and moderated responses.
121
 
122
  <div align="center">
123
+ <h2> 💌 Contact </h2>
124
  </div>
125
 
126
+ We look forward to hearing you and collaborating on this exciting project!
127
+
128
+ **Project Lead:**
129
+ - Guan Wang [imonenext at gmail dot com]
130
+ - [Alpay Ariyak](https://github.com/alpayariyak) [aariyak at wpi dot edu]
131
 
132
  <div align="center">
133
  <h2> Citation </h2>
 
140
  journal={arXiv preprint arXiv:2309.11235},
141
  year={2023}
142
  }
143
+ ```