nbl97 commited on
Commit
b9d2194
·
1 Parent(s): 8fe10b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -14
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  license: llama2
3
  ---
 
4
  <h3 align="center">
5
  Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
6
  </h3>
@@ -9,19 +10,23 @@ Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
9
  <a href="https://huggingface.co/Xwin-LM">
10
  <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
11
  </a>
 
 
 
12
  </p>
13
 
14
 
15
 
16
  **Step up your LLM alignment with Xwin-LM!**
17
 
18
- Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated.
19
 
20
  ## News
21
 
22
- - :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**.
23
- - :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models.
24
- - :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models.
 
25
 
26
 
27
  ## Model Card
@@ -59,19 +64,22 @@ The table below displays the performance of Xwin-LM on [AlpacaEval](https://tats
59
 
60
  ### Xwin-LM performance on NLP foundation tasks.
61
 
62
- The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks.
63
 
64
  | Model | MMLU 5-shot | ARC 25-shot | TruthfulQA 0-shot | HellaSwag 10-shot | Average |
65
  |------------------|-------------|-------------|-------------------|-------------------|------------|
66
- | Text-davinci-003 | <u>56.9<u/> | **85.2** | **59.3** | <u>82.2<u/> | **70.9** |
67
  |Vicuna-13b 1.1 | 51.3 | 53.0 | 51.8 | 80.1 | 59.1 |
68
- |Guanaco 30B | 57.6 | 63.7 | 50.7 | **85.1** | 64.3 |
69
  | WizardLM-7B 1.0 | 42.7 | 51.6 | 44.7 | 77.7 | 54.2 |
70
  | WizardLM-13B 1.0 | 52.3 | 57.2 | 50.5 | 81.0 | 60.2 |
71
- | WizardLM-30B 1.0 | **58.8** | <u>62.5<u/> | <u>52.4<u/> | 83.3 | <u>64.2<u/>|
72
- | **Xwin-LM-7B-V0.1** | 49.7 | 56.2 | 48.1 | 79.5 | 58.4 |
73
- | **Xwin-LM-13B-V0.1** | - | - | - | - | - |
74
- | **Xwin-LM-70B-V0.1** | - | - | - | - | - |
 
 
 
75
 
76
 
77
  ## Inference
@@ -84,7 +92,7 @@ A chat between a curious user and an artificial intelligence assistant. The assi
84
 
85
  ### HuggingFace Example
86
 
87
- ```
88
  from transformers import AutoTokenizer, AutoModelForCausalLM
89
 
90
  model = AutoModelForCausalLM.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
@@ -105,7 +113,7 @@ print(output)
105
 
106
  ### vllm Example
107
  Because Xwin-LM is based on Llama2, it also offers support for rapid inference using [vllm](https://github.com/vllm-project/vllm). Please refer to [vllm](https://github.com/vllm-project/vllm) for detailed installation instructions.
108
- ```
109
  from vllm import LLM, SamplingParams
110
  (
111
  prompt := "A chat between a curious user and an artificial intelligence assistant. "
@@ -123,6 +131,10 @@ for output in outputs:
123
  print(generated_text)
124
  ```
125
 
 
 
 
 
126
 
127
  ## Citation
128
  Please consider citing our work if you use the data or code in this repo.
@@ -139,4 +151,4 @@ Please consider citing our work if you use the data or code in this repo.
139
 
140
  ## Acknowledgements
141
 
142
- Thanks to [Llama 2](https://ai.meta.com/llama/), [FastChat](https://github.com/lm-sys/FastChat), [AlpacaFarm](https://github.com/tatsu-lab/alpaca_farm), and [vllm](https://github.com/vllm-project/vllm).
 
1
  ---
2
  license: llama2
3
  ---
4
+
5
  <h3 align="center">
6
  Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
7
  </h3>
 
10
  <a href="https://huggingface.co/Xwin-LM">
11
  <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
12
  </a>
13
+ <a href="https://github.com/Xwin-LM/Xwin-LM">
14
+ <img src="https://img.shields.io/badge/GitHub-yellow.svg?style=social&logo=github">
15
+ </a>
16
  </p>
17
 
18
 
19
 
20
  **Step up your LLM alignment with Xwin-LM!**
21
 
22
+ Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models (RM), reject sampling, reinforcement learning from human feedback (RLHF), etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated.
23
 
24
  ## News
25
 
26
+ - 💥 [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**.
27
+ - 🔍 [Sep, 2023] RLHF plays crucial role in the strong performance of Xwin-LM-V0.1 release!
28
+ - 💥 [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models.
29
+ - 💥 [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models.
30
 
31
 
32
  ## Model Card
 
64
 
65
  ### Xwin-LM performance on NLP foundation tasks.
66
 
67
+ The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks in [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
68
 
69
  | Model | MMLU 5-shot | ARC 25-shot | TruthfulQA 0-shot | HellaSwag 10-shot | Average |
70
  |------------------|-------------|-------------|-------------------|-------------------|------------|
71
+ | Text-davinci-003 | 56.9 | **85.2** | 59.3 | 82.2 | 70.9 |
72
  |Vicuna-13b 1.1 | 51.3 | 53.0 | 51.8 | 80.1 | 59.1 |
73
+ |Guanaco 30B | 57.6 | 63.7 | 50.7 | 85.1 | 64.3 |
74
  | WizardLM-7B 1.0 | 42.7 | 51.6 | 44.7 | 77.7 | 54.2 |
75
  | WizardLM-13B 1.0 | 52.3 | 57.2 | 50.5 | 81.0 | 60.2 |
76
+ | WizardLM-30B 1.0 | 58.8 | 62.5 | 52.4 | 83.3 | 64.2|
77
+ | Llama-2-7B-Chat | 48.3 | 52.9 | 45.6 | 78.6 | 56.4 |
78
+ | Llama-2-13B-Chat | 54.6 | 59.0 | 44.1 | 81.9 | 59.9 |
79
+ | Llama-2-70B-Chat | 63.9 | 64.6 | 52.8 | 85.9 | 66.8 |
80
+ | **Xwin-LM-7B-V0.1** | 49.7 | 56.2 | 48.1 | 79.5 | 58.4 |
81
+ | **Xwin-LM-13B-V0.1** | 56.6 | 62.4 | 45.5 | 83.0 | 61.9 |
82
+ | **Xwin-LM-70B-V0.1** | **69.6** | 70.5 | **60.1** | **87.1** | **71.8** |
83
 
84
 
85
  ## Inference
 
92
 
93
  ### HuggingFace Example
94
 
95
+ ```python
96
  from transformers import AutoTokenizer, AutoModelForCausalLM
97
 
98
  model = AutoModelForCausalLM.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
 
113
 
114
  ### vllm Example
115
  Because Xwin-LM is based on Llama2, it also offers support for rapid inference using [vllm](https://github.com/vllm-project/vllm). Please refer to [vllm](https://github.com/vllm-project/vllm) for detailed installation instructions.
116
+ ```python
117
  from vllm import LLM, SamplingParams
118
  (
119
  prompt := "A chat between a curious user and an artificial intelligence assistant. "
 
131
  print(generated_text)
132
  ```
133
 
134
+ ## TODO
135
+
136
+ - [ ] Release the source code
137
+ - [ ] Release more capabilities, such as math, reasoning, and etc.
138
 
139
  ## Citation
140
  Please consider citing our work if you use the data or code in this repo.
 
151
 
152
  ## Acknowledgements
153
 
154
+ Thanks to [Llama 2](https://ai.meta.com/llama/), [FastChat](https://github.com/lm-sys/FastChat), [AlpacaFarm](https://github.com/tatsu-lab/alpaca_farm), and [vllm](https://github.com/vllm-project/vllm).