nxphi47 commited on
Commit
da67d9b
1 Parent(s): 5f57aab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -39
README.md CHANGED
@@ -101,7 +101,7 @@ Baselines were evaluated using their respective chat-template and system prompts
101
 
102
  #### Zero-shot MGSM
103
 
104
- [SeaLLM-7B-v2](https://huggingface.co/SeaLLMs/SeaLLM-7B-v2) also outperforms GPT-3.5 and Qwen-14B on the multilingual MGSM for Zh and Th.
105
 
106
  | Model | MGSM-Zh | MGSM-Th
107
  |-----| ----- | ---
@@ -126,27 +126,6 @@ We evaluate models on 3 benchmarks following the recommended default setups: 5-s
126
  | SeaLLM-7B-v2.5 | Multi | 64.05 | 76.87 | 62.54 | 63.11 | 53.30 | 48.64 | 46.86
127
 
128
 
129
- ### MT-Bench
130
-
131
- **SeaLLM-7B-v2.5 only score 7.40 on MT-bench, better preference tuning is needed**
132
- On the English [MT-bench](https://arxiv.org/abs/2306.05685) metric, SeaLLM-7B-v2 achieves **7.54** score on the MT-bench (3rd place on the leaderboard for 7B category), outperforms many 70B models and is arguably the only one that handles 10 SEA languages.
133
-
134
- Refer to [mt_bench/seallm_7b_v2.jsonl](https://huggingface.co/SeaLLMs/SeaLLM-7B-v2/blob/main/evaluation/mt_bench/seallm_7b_v2.jsonl) for the MT-bench predictions of SeaLLM-7B-v2, and [here](https://github.com/lm-sys/FastChat/issues/3013#issue-2118685341) to reproduce it.
135
-
136
- | Model | Access | Langs | MT-Bench
137
- | --- | --- | --- | --- |
138
- | GPT-4-turbo | closed | multi | 9.32
139
- | GPT-4-0613 | closed | multi | 9.18
140
- | Mixtral-8x7b (46B) | open | multi | 8.3
141
- | Starling-LM-7B-alpha | open | mono (en) | 8.0
142
- | OpenChat-3.5-7B | open | mono (en) | 7.81
143
- | **SeaLLM-7B-v2** | **open** | **multi (10+)** | **7.54**
144
- | **SeaLLM-7B-v2.5** | **open** | **multi (10+)** | **7.40**
145
- | [Qwen-14B](https://huggingface.co/Qwen/Qwen-14B-Chat) | open | multi | 6.96
146
- | [Llama-2-70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | open | mono (en) | 6.86
147
- | Mistral-7B-instuct | open | mono (en) | 6.84
148
-
149
-
150
  ### Sea-Bench
151
 
152
  Not ready
@@ -165,7 +144,6 @@ Hello world<eos>
165
  <|im_start|>assistant
166
  Hi there, how can I help?<eos>"""
167
 
168
- # NOTE: previous commit has \n between </s> and <|im_start|>, that was incorrect!
169
  # <|im_start|> is not a special token.
170
  # Transformers chat_template should be consistent with vLLM format below.
171
 
@@ -176,6 +154,9 @@ print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt)))
176
  ```
177
 
178
  #### Using transformers's chat_template
 
 
 
179
  ```python
180
 
181
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -183,8 +164,8 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
183
  device = "cuda" # the device to load the model onto
184
 
185
  # use bfloat16 to ensure the best performance.
186
- model = AutoModelForCausalLM.from_pretrained("SeaLLMs/SeaLLM-7B-v2", torch_dtype=torch.bfloat16, device_map=device)
187
- tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM-7B-v2")
188
 
189
  messages = [
190
  {"role": "system", "content": "You are a helpful assistant."},
@@ -195,7 +176,6 @@ messages = [
195
 
196
  encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
197
  print(tokenizer.convert_ids_to_tokens(encodeds[0]))
198
- # ['<s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'system', '<0x0A>', 'You', '▁are', '▁a', '▁helpful', '▁assistant', '.', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'user', '<0x0A>', 'Hello', '▁world', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'ass', 'istant', '<0x0A>', 'Hi', '▁there', ',', '▁how', '▁can', '▁I', '▁help', '▁you', '▁today', '?', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'user', '<0x0A>', 'Ex', 'plain', '▁general', '▁rel', 'ativity', '▁in', '▁details', '.', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'ass', 'istant', '<0x0A>']
199
 
200
  model_inputs = encodeds.to(device)
201
  model.to(device)
@@ -210,11 +190,9 @@ print(decoded[0])
210
 
211
  ```python
212
  from vllm import LLM, SamplingParams
213
- TURN_TEMPLATE = "<|im_start|>{role}\n{content}</s>"
214
  TURN_PREFIX = "<|im_start|>{role}\n"
215
 
216
- # There is no \n between </s> and <|im_start|>.
217
-
218
  def seallm_chat_convo_format(conversations, add_assistant_prefix: bool, system_prompt=None):
219
  # conversations: list of dict with key `role` and `content` (openai format)
220
  if conversations[0]['role'] != 'system' and system_prompt is not None:
@@ -228,8 +206,8 @@ def seallm_chat_convo_format(conversations, add_assistant_prefix: bool, system_p
228
  text += prompt
229
  return text
230
 
231
- sparams = SamplingParams(temperature=0.1, max_tokens=1024, stop=['</s>', '<|im_start|>'])
232
- llm = LLM("SeaLLMs/SeaLLM-7B-v2", dtype="bfloat16")
233
 
234
  message = "Explain general relativity in details."
235
  prompt = seallm_chat_convo_format(message, True)
@@ -238,7 +216,7 @@ gen = llm.generate(prompt, sampling_params)
238
  print(gen[0].outputs[0].text)
239
  ```
240
 
241
- #### Fine-tuning SeaLLM-7B-v2
242
 
243
  Should follow the chat format and accurately mask out source tokens. Here is an example.
244
 
@@ -250,7 +228,7 @@ conversations = [
250
  {"role": "user", "content": "Tell me a joke."},
251
  {"role": "assistant", "content": "Why don't scientists trust atoms? Because they make up everything."},
252
  ]
253
- def seallm_7b_v2_tokenize_multi_turns(tokenizer, conversations, add_assistant_prefix=False):
254
  """
255
  Inputs:
256
  conversations: list of dict following openai format, eg
@@ -271,7 +249,7 @@ def seallm_7b_v2_tokenize_multi_turns(tokenizer, conversations, add_assistant_pr
271
  labels = sample['input_ids'].clone()
272
  labels[sample['token_type_ids'] == 0] = -100
273
  """
274
- TURN_TEMPLATE = "<|im_start|>{role}\n{content}</s>"
275
  TURN_PREFIX = "<|im_start|>{role}\n"
276
  sample = None
277
  assistant_prefix_len = None
@@ -304,12 +282,9 @@ def seallm_7b_v2_tokenize_multi_turns(tokenizer, conversations, add_assistant_pr
304
  return sample
305
 
306
  # ! testing
307
- sample = seallm_7b_v2_tokenize_multi_turns(tokenizer, conversations)
308
  print(tokenizer.convert_ids_to_tokens(sample['input_ids']))
309
  print(sample['token_type_ids'])
310
- # ['<s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'system', '<0x0A>', 'You', '▁are', '▁hel', 'ful', '▁assistant', '.', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'user', '<0x0A>', 'Hello', '▁world', '.', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'ass', 'istant', '<0x0A>', 'Hi', '▁there', ',', '▁how', '▁can', '▁I', '▁help', '?', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'user', '<0x0A>', 'Tell', '▁me', '▁a', '▁joke', '.', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'ass', 'istant', '<0x0A>', 'Why', '▁don', "'", 't', '▁scientists', '▁trust', '▁atoms', '?', '▁Because', '▁they', '▁make', '▁up', '▁everything', '.', '</s>']
311
- # [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
312
-
313
 
314
 
315
  ```
@@ -329,7 +304,7 @@ If you find our project useful, we hope you would kindly star our repo and cite
329
 
330
  ```
331
  @article{damonlpsg2023seallm,
332
- author = {Xuan-Phi Nguyen*, Wenxuan Zhang*, Xin Li*, Mahani Aljunied*,
333
  Zhiqiang Hu, Chenhui Shen^, Yew Ken Chia^, Xingxuan Li, Jianyu Wang,
334
  Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen Yang,
335
  Chaoqun Liu, Hang Zhang, Lidong Bing},
 
101
 
102
  #### Zero-shot MGSM
103
 
104
+ [SeaLLM-7B-v2.5](https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5) also outperforms GPT-3.5 and Qwen-14B on the multilingual MGSM for Thai.
105
 
106
  | Model | MGSM-Zh | MGSM-Th
107
  |-----| ----- | ---
 
126
  | SeaLLM-7B-v2.5 | Multi | 64.05 | 76.87 | 62.54 | 63.11 | 53.30 | 48.64 | 46.86
127
 
128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ### Sea-Bench
130
 
131
  Not ready
 
144
  <|im_start|>assistant
145
  Hi there, how can I help?<eos>"""
146
 
 
147
  # <|im_start|> is not a special token.
148
  # Transformers chat_template should be consistent with vLLM format below.
149
 
 
154
  ```
155
 
156
  #### Using transformers's chat_template
157
+
158
+ Install the latest transformers (>4.40)
159
+
160
  ```python
161
 
162
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
164
  device = "cuda" # the device to load the model onto
165
 
166
  # use bfloat16 to ensure the best performance.
167
+ model = AutoModelForCausalLM.from_pretrained("SeaLLMs/SeaLLM-7B-v2.5", torch_dtype=torch.bfloat16, device_map=device)
168
+ tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM-7B-v2.5")
169
 
170
  messages = [
171
  {"role": "system", "content": "You are a helpful assistant."},
 
176
 
177
  encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
178
  print(tokenizer.convert_ids_to_tokens(encodeds[0]))
 
179
 
180
  model_inputs = encodeds.to(device)
181
  model.to(device)
 
190
 
191
  ```python
192
  from vllm import LLM, SamplingParams
193
+ TURN_TEMPLATE = "<|im_start|>{role}\n{content}<eos>\n"
194
  TURN_PREFIX = "<|im_start|>{role}\n"
195
 
 
 
196
  def seallm_chat_convo_format(conversations, add_assistant_prefix: bool, system_prompt=None):
197
  # conversations: list of dict with key `role` and `content` (openai format)
198
  if conversations[0]['role'] != 'system' and system_prompt is not None:
 
206
  text += prompt
207
  return text
208
 
209
+ sparams = SamplingParams(temperature=0.1, max_tokens=1024, stop=['<eos>', '<|im_start|>'])
210
+ llm = LLM("SeaLLMs/SeaLLM-7B-v2.5", dtype="bfloat16")
211
 
212
  message = "Explain general relativity in details."
213
  prompt = seallm_chat_convo_format(message, True)
 
216
  print(gen[0].outputs[0].text)
217
  ```
218
 
219
+ #### Fine-tuning SeaLLM-7B-v2.5
220
 
221
  Should follow the chat format and accurately mask out source tokens. Here is an example.
222
 
 
228
  {"role": "user", "content": "Tell me a joke."},
229
  {"role": "assistant", "content": "Why don't scientists trust atoms? Because they make up everything."},
230
  ]
231
+ def seallm_7b_v25_tokenize_multi_turns(tokenizer, conversations, add_assistant_prefix=False):
232
  """
233
  Inputs:
234
  conversations: list of dict following openai format, eg
 
249
  labels = sample['input_ids'].clone()
250
  labels[sample['token_type_ids'] == 0] = -100
251
  """
252
+ TURN_TEMPLATE = "<|im_start|>{role}\n{content}<eos>\n"
253
  TURN_PREFIX = "<|im_start|>{role}\n"
254
  sample = None
255
  assistant_prefix_len = None
 
282
  return sample
283
 
284
  # ! testing
285
+ sample = seallm_7b_v25_tokenize_multi_turns(tokenizer, conversations)
286
  print(tokenizer.convert_ids_to_tokens(sample['input_ids']))
287
  print(sample['token_type_ids'])
 
 
 
288
 
289
 
290
  ```
 
304
 
305
  ```
306
  @article{damonlpsg2023seallm,
307
+ author = {Xuan-Phi Nguyen*, Wenxuan Zhang*, Xin Li*, Mahani Aljunied*, Weiwen Xu, Hou Pong Chan,
308
  Zhiqiang Hu, Chenhui Shen^, Yew Ken Chia^, Xingxuan Li, Jianyu Wang,
309
  Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen Yang,
310
  Chaoqun Liu, Hang Zhang, Lidong Bing},