nbl97 commited on
Commit
2a9dfc3
1 Parent(s): 2411131

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -5
README.md CHANGED
@@ -10,19 +10,23 @@ Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
10
  <a href="https://huggingface.co/Xwin-LM">
11
  <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
12
  </a>
 
 
 
13
  </p>
14
 
15
 
16
 
17
  **Step up your LLM alignment with Xwin-LM!**
18
 
19
- Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated.
20
 
21
  ## News
22
 
23
- - :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**.
24
- - :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models.
25
- - :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models.
 
26
 
27
 
28
  ## Model Card
@@ -50,4 +54,101 @@ The table below displays the performance of Xwin-LM on [AlpacaEval](https://tats
50
  | **Xwin-LM-7B-V0.1** | **87.35** | **76.40** | **47.57** |
51
  | Llama-2-13B-Chat | 81.09 | 64.22 | 30.92 |
52
 
53
- ##
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  <a href="https://huggingface.co/Xwin-LM">
11
  <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
12
  </a>
13
+ <a href="https://github.com/Xwin-LM/Xwin-LM">
14
+ <img src="https://img.shields.io/badge/GitHub-yellow.svg?style=social&logo=github">
15
+ </a>
16
  </p>
17
 
18
 
19
 
20
  **Step up your LLM alignment with Xwin-LM!**
21
 
22
+ Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models (RM), reject sampling, reinforcement learning from human feedback (RLHF), etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated.
23
 
24
  ## News
25
 
26
+ - 💥 [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**.
27
+ - 🔍 [Sep, 2023] RLHF plays crucial role in the strong performance of Xwin-LM-V0.1 release!
28
+ - 💥 [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models.
29
+ - 💥 [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models.
30
 
31
 
32
  ## Model Card
 
54
  | **Xwin-LM-7B-V0.1** | **87.35** | **76.40** | **47.57** |
55
  | Llama-2-13B-Chat | 81.09 | 64.22 | 30.92 |
56
 
57
+ ### Case study on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/)
58
+
59
+ | Prompts | GTP4 response | Xwin-LM response |
60
+ |----------------|------------|--------|
61
+ |I've put a plate on top of an apple, and take the plate from kitchen to dinning room. Where is the apple now|The apple is now in the dining room, as it was underneath the plate when you moved the plate from the kitchen to the dining room.|The apple is still in the kitchen, as you have only moved the plate. The apple is under the plate, so if you want to bring the apple to the dining room, you should either move the apple directly or put the apple on the plate and then move the plate to the dining room.|
62
+ | Answer the following question.<br> How do you say "good evening" in French. | In French, "good evening" is said as "bonsoir." | In French, "good evening" is translated to "bonsoir." This greeting is used in the evening and nighttime hours, typically after 6 p.m. It is a combination of "bon," which means "good," and "soir," which means "evening." To pronounce it, say "bone-swahr." |
63
+
64
+
65
+ ### Xwin-LM performance on NLP foundation tasks.
66
+
67
+ The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks in [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
68
+
69
+ | Model | MMLU 5-shot | ARC 25-shot | TruthfulQA 0-shot | HellaSwag 10-shot | Average |
70
+ |------------------|-------------|-------------|-------------------|-------------------|------------|
71
+ | Text-davinci-003 | 56.9 | **85.2** | 59.3 | 82.2 | 70.9 |
72
+ |Vicuna-13b 1.1 | 51.3 | 53.0 | 51.8 | 80.1 | 59.1 |
73
+ |Guanaco 30B | 57.6 | 63.7 | 50.7 | 85.1 | 64.3 |
74
+ | WizardLM-7B 1.0 | 42.7 | 51.6 | 44.7 | 77.7 | 54.2 |
75
+ | WizardLM-13B 1.0 | 52.3 | 57.2 | 50.5 | 81.0 | 60.2 |
76
+ | WizardLM-30B 1.0 | 58.8 | 62.5 | 52.4 | 83.3 | 64.2|
77
+ | Llama-2-7B-Chat | 48.3 | 52.9 | 45.6 | 78.6 | 56.4 |
78
+ | Llama-2-13B-Chat | 54.6 | 59.0 | 44.1 | 81.9 | 59.9 |
79
+ | Llama-2-70B-Chat | 63.9 | 64.6 | 52.8 | 85.9 | 66.8 |
80
+ | **Xwin-LM-7B-V0.1** | 49.7 | 56.2 | 48.1 | 79.5 | 58.4 |
81
+ | **Xwin-LM-13B-V0.1** | 56.6 | 62.4 | 45.5 | 83.0 | 61.9 |
82
+ | **Xwin-LM-70B-V0.1** | **69.6** | 70.5 | **60.1** | **87.1** | **71.8** |
83
+
84
+
85
+ ## Inference
86
+
87
+ ### Conversation templates
88
+ To obtain desired results, please strictly follow the conversation templates when utilizing our model for inference. Our model adopts the prompt format established by [Vicuna](https://github.com/lm-sys/FastChat) and is equipped to support **multi-turn** conversations.
89
+ ```
90
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi! ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am Xwin-LM.</s>......
91
+ ```
92
+
93
+ ### HuggingFace Example
94
+
95
+ ```python
96
+ from transformers import AutoTokenizer, AutoModelForCausalLM
97
+
98
+ model = AutoModelForCausalLM.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
99
+ tokenizer = AutoTokenizer.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
100
+ (
101
+ prompt := "A chat between a curious user and an artificial intelligence assistant. "
102
+ "The assistant gives helpful, detailed, and polite answers to the user's questions. "
103
+ "USER: Hello, can you help me? "
104
+ "ASSISTANT:"
105
+ )
106
+ inputs = tokenizer(prompt, return_tensors="pt")
107
+ samples = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
108
+ output = tokenizer.decode(samples[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
109
+ print(output)
110
+ # Of course! I'm here to help. Please feel free to ask your question or describe the issue you're having, and I'll do my best to assist you.
111
+ ```
112
+
113
+
114
+ ### vllm Example
115
+ Because Xwin-LM is based on Llama2, it also offers support for rapid inference using [vllm](https://github.com/vllm-project/vllm). Please refer to [vllm](https://github.com/vllm-project/vllm) for detailed installation instructions.
116
+ ```python
117
+ from vllm import LLM, SamplingParams
118
+ (
119
+ prompt := "A chat between a curious user and an artificial intelligence assistant. "
120
+ "The assistant gives helpful, detailed, and polite answers to the user's questions. "
121
+ "USER: Hello, can you help me? "
122
+ "ASSISTANT:"
123
+ )
124
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=4096)
125
+ llm = LLM(model="Xwin-LM/Xwin-LM-7B-V0.1")
126
+ outputs = llm.generate([prompt,], sampling_params)
127
+
128
+ for output in outputs:
129
+ prompt = output.prompt
130
+ generated_text = output.outputs[0].text
131
+ print(generated_text)
132
+ ```
133
+
134
+ ## TODO
135
+
136
+ - [ ] Release the source code
137
+ - [ ] Release more capabilities, such as math, reasoning, and etc.
138
+
139
+ ## Citation
140
+ Please consider citing our work if you use the data or code in this repo.
141
+ ```
142
+ @software{xwin-lm,
143
+ title = {Xwin-LM},
144
+ author = {Xwin-LM Team},
145
+ url = {https://github.com/Xwin-LM/Xwin-LM},
146
+ version = {pre-release},
147
+ year = {2023},
148
+ month = {9},
149
+ }
150
+ ```
151
+
152
+ ## Acknowledgements
153
+
154
+ Thanks to [Llama 2](https://ai.meta.com/llama/), [FastChat](https://github.com/lm-sys/FastChat), [AlpacaFarm](https://github.com/tatsu-lab/alpaca_farm), and [vllm](https://github.com/vllm-project/vllm).