imone commited on
Commit
deccdb8
โ€ข
1 Parent(s): e23a160

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -20
README.md CHANGED
@@ -6,6 +6,7 @@ tags:
6
  - C-RLFT
7
  datasets:
8
  - openchat/openchat_sharegpt4_dataset
 
9
  - imone/OpenOrca_FLAN
10
  - LDJnr/LessWrong-Amplify-Instruct
11
  - LDJnr/Pure-Dove
@@ -19,8 +20,6 @@ library_name: transformers
19
  pipeline_tag: text-generation
20
  ---
21
 
22
- # OpenChat (1210 Version): Advancing Open-source Language Models with Mixed-Quality Data
23
-
24
  <div align="center">
25
  <img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/logo_new.png" style="width: 65%">
26
  </div>
@@ -34,18 +33,27 @@ pipeline_tag: text-generation
34
  <a href="https://arxiv.org/pdf/2309.11235.pdf">Paper</a>
35
  </p>
36
 
37
- **๐Ÿ”ฅ **
 
 
 
 
38
 
39
- | Model | HumanEval+ |
40
- |-----------------------------|------------|
41
- | GPT-3.5 (December 2023) | 64.6 |
42
- | **OpenChat 3.5 1210** | **63.4** |
43
- | GPT-3.5 (March 2023) | 64.6 |
44
- | OpenHermes 2.5 | 41.5 |
45
 
46
- <div align="center" style="justify-content: center; align-items: center; "'>
47
- <img src="https://github.com/alpayariyak/openchat/blob/master/assets/3.5-benchmarks.png?raw=true" style="width: 100%; border-radius: 0.5em">
48
- </div>
 
 
 
 
 
 
 
 
 
 
49
 
50
  OpenChat is an innovative library of open-source language models, fine-tuned with [C-RLFT](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.
51
 
@@ -59,10 +67,14 @@ Once started, the server listens at `localhost:18888` for requests and is compat
59
 
60
  If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
61
 
 
 
 
 
62
  <details>
63
  <summary>Example request (click to expand)</summary>
64
 
65
- Default Mode (Chat & Coding)
66
 
67
  ```bash
68
  curl http://localhost:18888/v1/chat/completions \
@@ -73,7 +85,7 @@ curl http://localhost:18888/v1/chat/completions \
73
  }'
74
  ```
75
 
76
- Mathematical Reasoning Mode
77
 
78
  ```bash
79
  curl http://localhost:18888/v1/chat/completions \
@@ -87,24 +99,22 @@ curl http://localhost:18888/v1/chat/completions \
87
 
88
  </details>
89
 
90
- | Model | Size | Context | Weights | Serving |
91
- |-------------------|------|---------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
92
- | OpenChat 3.5 1210 | 7B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat_3.5_1210) | `python -m ochat.serving.openai_api_server --model openchat/openchat_3.5_1210 --engine-use-ray --worker-use-ray` |
93
-
94
  ### Conversation templates
95
 
96
- Default Mode (GPT4 Correct)
97
 
98
  ```
99
  GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
100
  ```
101
 
102
- Mathematical Reasoning Mode
103
 
104
  ```
105
  Math Correct User: 10.3 โˆ’ 7988.8133=<|end_of_turn|>Math Correct Assistant:
106
  ```
107
 
 
 
108
  The default (GPT4 Correct) template is also available as the integrated `tokenizer.chat_template`,
109
  which can be used instead of manually specifying the template:
110
 
@@ -118,6 +128,38 @@ tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
118
  assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
119
  ```
120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  ## Comparison with [X.AI Grok models](https://x.ai/)
122
 
123
  | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
@@ -127,6 +169,8 @@ assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 42
127
  | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
128
  | Grok-1 | Proprietary | ???B | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
129
 
 
 
130
  ## <a id="benchmarks"></a> Benchmarks
131
 
132
  | Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
@@ -175,6 +219,7 @@ OpenChat 3.5 was trained with C-RLFT on a collection of publicly available high-
175
 
176
  - [OpenChat ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset)
177
  - [Open-Orca with FLAN answers](https://huggingface.co/datasets/imone/OpenOrca_FLAN)
 
178
  - Capybara [1](https://huggingface.co/datasets/LDJnr/Pure-Dove) [2](https://huggingface.co/datasets/LDJnr/Verified-Camel) [3](https://huggingface.co/datasets/LDJnr/LessWrong-Amplify-Instruct)
179
  - [GOAT](https://huggingface.co/datasets/tiedong/goat)
180
  - [Glaive](https://huggingface.co/datasets/glaiveai/glaive-code-assistant)
 
6
  - C-RLFT
7
  datasets:
8
  - openchat/openchat_sharegpt4_dataset
9
+ - kaist-ai/Feedback-Collection
10
  - imone/OpenOrca_FLAN
11
  - LDJnr/LessWrong-Amplify-Instruct
12
  - LDJnr/Pure-Dove
 
20
  pipeline_tag: text-generation
21
  ---
22
 
 
 
23
  <div align="center">
24
  <img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/logo_new.png" style="width: 65%">
25
  </div>
 
33
  <a href="https://arxiv.org/pdf/2309.11235.pdf">Paper</a>
34
  </p>
35
 
36
+ # OpenChat 3.5: First Update Released on December 10th!
37
+
38
+ **๐Ÿš€ 15-point improvement in coding performance**
39
+
40
+ **๐Ÿ’ก Introducing a coding & generalist mode and a mathematical reasoning mode**
41
 
42
+ **๐Ÿง‘โ€โš–๏ธ Experimental support for evaluator and feedback capabilities**
 
 
 
 
 
43
 
44
+ **๐Ÿค– Outperforms Grok-1 in 3/4 and ChatGPT (March) in 5/8 benchmarks**
45
+
46
+ | Model | Size | HumanEval+ pass@1 |
47
+ |-----------------------------|----------|------------|
48
+ | ChatGPT (December 12, 2023) | - | 64.6 |
49
+ | WizardCoder-Python-34B-V1.0 | 34B | 64.6 |
50
+ | **OpenChat 3.5 (Dec 10)** | **7B** | **63.4** |
51
+ | OpenHermes 2.5 | 7B | 41.5 |
52
+
53
+ <div style="display: flex; justify-content: center; align-items: center">
54
+ <img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/openchat.png" style="width: 45%;">
55
+ <img src="https://raw.githubusercontent.com/imoneoi/openchat/master/assets/openchat_grok.png" style="width: 45%;">
56
+ </div>
57
 
58
  OpenChat is an innovative library of open-source language models, fine-tuned with [C-RLFT](https://arxiv.org/pdf/2309.11235.pdf) - a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision.
59
 
 
67
 
68
  If you want to deploy the server as an online service, you can use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. For security purposes, we recommend using an [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server.
69
 
70
+ | Model | Size | Context | Weights | Serving |
71
+ |-------------------|------|---------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
72
+ | OpenChat 3.5 1210 | 7B | 8192 | [Huggingface](https://huggingface.co/openchat/openchat_3.5_1210) | `python -m ochat.serving.openai_api_server --model openchat/openchat_3.5_1210 --engine-use-ray --worker-use-ray` |
73
+
74
  <details>
75
  <summary>Example request (click to expand)</summary>
76
 
77
+ ๐Ÿ’ก **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks
78
 
79
  ```bash
80
  curl http://localhost:18888/v1/chat/completions \
 
85
  }'
86
  ```
87
 
88
+ ๐Ÿงฎ **Mathematical Reasoning Mode**: Tailored for solving math problems
89
 
90
  ```bash
91
  curl http://localhost:18888/v1/chat/completions \
 
99
 
100
  </details>
101
 
 
 
 
 
102
  ### Conversation templates
103
 
104
+ ๐Ÿ’ก **Default Mode (GPT4 Correct)**: Best for coding, chat and general tasks
105
 
106
  ```
107
  GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
108
  ```
109
 
110
+ ๐Ÿงฎ **Mathematical Reasoning Mode**: Tailored for solving math problems
111
 
112
  ```
113
  Math Correct User: 10.3 โˆ’ 7988.8133=<|end_of_turn|>Math Correct Assistant:
114
  ```
115
 
116
+ โš ๏ธ **Notice:** Remember to set `<|end_of_turn|>` as end of generation token.
117
+
118
  The default (GPT4 Correct) template is also available as the integrated `tokenizer.chat_template`,
119
  which can be used instead of manually specifying the template:
120
 
 
128
  assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
129
  ```
130
 
131
+ ## ๐Ÿง‘โ€โš–๏ธ (Experimental) Evaluator / Feedback Capabilities
132
+
133
+ We've included evaluator capabilities in this release to advance open-source models as evaluators. You can use `Default Mode (GPT4 Correct)` with the following prompt (same as [Prometheus](https://huggingface.co/datasets/kaist-ai/Feedback-Collection)) to evaluate a response.
134
+
135
+ ```
136
+ ###Task Description:
137
+ An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given.
138
+ 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.
139
+ 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric.
140
+ 3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)"
141
+ 4. Please do not generate any other opening, closing, and explanations.
142
+
143
+ ###The instruction to evaluate:
144
+ {orig_instruction}
145
+
146
+ ###Response to evaluate:
147
+ {orig_response}
148
+
149
+ ###Reference Answer (Score 5):
150
+ {orig_reference_answer}
151
+
152
+ ###Score Rubrics:
153
+ [{orig_criteria}]
154
+ Score 1: {orig_score1_description}
155
+ Score 2: {orig_score2_description}
156
+ Score 3: {orig_score3_description}
157
+ Score 4: {orig_score4_description}
158
+ Score 5: {orig_score5_description}
159
+
160
+ ###Feedback:
161
+ ```
162
+
163
  ## Comparison with [X.AI Grok models](https://x.ai/)
164
 
165
  | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
 
169
  | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
170
  | Grok-1 | Proprietary | ???B | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
171
 
172
+ *: Grok results are reported by [X.AI](https://x.ai/).
173
+
174
  ## <a id="benchmarks"></a> Benchmarks
175
 
176
  | Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
 
219
 
220
  - [OpenChat ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset)
221
  - [Open-Orca with FLAN answers](https://huggingface.co/datasets/imone/OpenOrca_FLAN)
222
+ - [Feedback-Collection](https://huggingface.co/datasets/kaist-ai/Feedback-Collection)
223
  - Capybara [1](https://huggingface.co/datasets/LDJnr/Pure-Dove) [2](https://huggingface.co/datasets/LDJnr/Verified-Camel) [3](https://huggingface.co/datasets/LDJnr/LessWrong-Amplify-Instruct)
224
  - [GOAT](https://huggingface.co/datasets/tiedong/goat)
225
  - [Glaive](https://huggingface.co/datasets/glaiveai/glaive-code-assistant)