Text Generation
Transformers
PyTorch
English
llama
causal-lm
Inference Endpoints
text-generation-inference
jon-tow commited on
Commit
5f31406
1 Parent(s): e0b7553

refactor: move script to file

Browse files
Files changed (2) hide show
  1. README.md +8 -53
  2. apply_delta.py +49 -0
README.md CHANGED
@@ -17,63 +17,18 @@ datasets:
17
 
18
  StableVicuna-13B is a [Vicuna-13B](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
19
 
20
- ### Apply Delta weights
21
 
22
- ```python
23
- """
24
- Usage:
25
- python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta pvduy/stable-vicuna-13b-delta
26
- """
27
- import argparse
28
-
29
- import torch
30
- from tqdm import tqdm
31
- from transformers import AutoTokenizer, AutoModelForCausalLM
32
-
33
-
34
- def apply_delta(base_model_path, target_model_path, delta_path):
35
- print("Loading base model")
36
- base = AutoModelForCausalLM.from_pretrained(
37
- base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)
38
-
39
- print("Loading delta")
40
- delta = AutoModelForCausalLM.from_pretrained(delta_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)
41
- delta_tokenizer = AutoTokenizer.from_pretrained(delta_path)
42
 
43
- DEFAULT_PAD_TOKEN = "[PAD]"
44
- base_tokenizer = AutoTokenizer.from_pretrained(base_model_path, use_fast=False)
45
- num_new_tokens = base_tokenizer.add_special_tokens(dict(pad_token=DEFAULT_PAD_TOKEN))
46
-
47
- base.resize_token_embeddings(len(base_tokenizer))
48
- input_embeddings = base.get_input_embeddings().weight.data
49
- output_embeddings = base.get_output_embeddings().weight.data
50
- input_embeddings[-num_new_tokens:] = 0
51
- output_embeddings[-num_new_tokens:] = 0
52
-
53
- print("Applying delta")
54
- for name, param in tqdm(base.state_dict().items(), desc="Applying delta"):
55
- assert name in delta.state_dict()
56
- param.data += delta.state_dict()[name]
57
-
58
- print("Saving target model")
59
- base.save_pretrained(target_model_path)
60
- delta_tokenizer.save_pretrained(target_model_path)
61
-
62
-
63
- if __name__ == "__main__":
64
- parser = argparse.ArgumentParser()
65
- parser.add_argument("--base-model-path", type=str, required=True)
66
- parser.add_argument("--target-model-path", type=str, required=True)
67
- parser.add_argument("--delta-path", type=str, required=True)
68
-
69
- args = parser.parse_args()
70
-
71
- apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
72
  ```
73
 
 
74
  ## Usage
75
 
76
- Quickly get started chatting with the model by using the [`transformers`](https://huggingface.co/docs/transformers) library:
77
 
78
  ```python
79
  from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -118,14 +73,14 @@ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
118
 
119
  ### Training Dataset
120
 
121
- `stabilityai/stable-vicuna-13b` is fine-tuned on a mix of three datasets. [OpenAssistant Conversations Dataset (OASST1)](https://huggingface.co/datasets/OpenAssistant/oasst1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages;
122
  [GPT4All Prompt Generations](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations), a dataset of 400k prompts and responses generated by GPT-4; and [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine.
123
 
124
  The reward model used during RLHF was also trained on [OpenAssistant Conversations Dataset (OASST1)](https://huggingface.co/datasets/OpenAssistant/oasst1) along with two other datasets: [Anthropic HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), a dataset of preferences about AI assistant helpfulness and harmlessness; and [Stanford Human Preferences Dataset](https://huggingface.co/datasets/stanfordnlp/SHP) a dataset of 385K collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to legal advice.
125
 
126
  ### Training Procedure
127
 
128
- `stabilityai/sstable-vicuna-13b` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
129
 
130
  | Hyperparameter | Value |
131
  |-------------------|---------|
 
17
 
18
  StableVicuna-13B is a [Vicuna-13B](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
19
 
20
+ ### Apply Delta Weights
21
 
22
+ StableVicuna-13B cannot be used from the `stability/stable-vicuna-13b-delta` weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and `stability/stable-vicuna-13b-delta` weights. We provide the [`apply_delta.py`](https://huggingface.co/CarperAI/stable-vicuna-13b-delta/raw/main/apply_delta.py) script to automate the conversion, which you can run as:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ ```sh
25
+ python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta stabilityai/stable-vicuna-13b-delta
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
 
28
+
29
  ## Usage
30
 
31
+ Once the delta weights are applied, get started chatting with the model by using the [`transformers`](https://huggingface.co/docs/transformers) library:
32
 
33
  ```python
34
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
73
 
74
  ### Training Dataset
75
 
76
+ StableVicuna-13B is fine-tuned on a mix of three datasets. [OpenAssistant Conversations Dataset (OASST1)](https://huggingface.co/datasets/OpenAssistant/oasst1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages;
77
  [GPT4All Prompt Generations](https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations), a dataset of 400k prompts and responses generated by GPT-4; and [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine.
78
 
79
  The reward model used during RLHF was also trained on [OpenAssistant Conversations Dataset (OASST1)](https://huggingface.co/datasets/OpenAssistant/oasst1) along with two other datasets: [Anthropic HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), a dataset of preferences about AI assistant helpfulness and harmlessness; and [Stanford Human Preferences Dataset](https://huggingface.co/datasets/stanfordnlp/SHP) a dataset of 385K collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to legal advice.
80
 
81
  ### Training Procedure
82
 
83
+ `stabilityai/sstable-vicuna-13b-delta` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
84
 
85
  | Hyperparameter | Value |
86
  |-------------------|---------|
apply_delta.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Usage:
3
+ python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta pvduy/stable-vicuna-13b-delta
4
+ """
5
+ import argparse
6
+
7
+ import torch
8
+ from tqdm import tqdm
9
+ from transformers import AutoTokenizer, AutoModelForCausalLM
10
+
11
+
12
+ def apply_delta(base_model_path, target_model_path, delta_path):
13
+ print("Loading base model")
14
+ base = AutoModelForCausalLM.from_pretrained(
15
+ base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)
16
+
17
+ print("Loading delta")
18
+ delta = AutoModelForCausalLM.from_pretrained(delta_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)
19
+ delta_tokenizer = AutoTokenizer.from_pretrained(delta_path)
20
+
21
+ DEFAULT_PAD_TOKEN = "[PAD]"
22
+ base_tokenizer = AutoTokenizer.from_pretrained(base_model_path, use_fast=False)
23
+ num_new_tokens = base_tokenizer.add_special_tokens(dict(pad_token=DEFAULT_PAD_TOKEN))
24
+
25
+ base.resize_token_embeddings(len(base_tokenizer))
26
+ input_embeddings = base.get_input_embeddings().weight.data
27
+ output_embeddings = base.get_output_embeddings().weight.data
28
+ input_embeddings[-num_new_tokens:] = 0
29
+ output_embeddings[-num_new_tokens:] = 0
30
+
31
+ print("Applying delta")
32
+ for name, param in tqdm(base.state_dict().items(), desc="Applying delta"):
33
+ assert name in delta.state_dict()
34
+ param.data += delta.state_dict()[name]
35
+
36
+ print("Saving target model")
37
+ base.save_pretrained(target_model_path)
38
+ delta_tokenizer.save_pretrained(target_model_path)
39
+
40
+
41
+ if __name__ == "__main__":
42
+ parser = argparse.ArgumentParser()
43
+ parser.add_argument("--base-model-path", type=str, required=True)
44
+ parser.add_argument("--target-model-path", type=str, required=True)
45
+ parser.add_argument("--delta-path", type=str, required=True)
46
+
47
+ args = parser.parse_args()
48
+
49
+ apply_delta(args.base_model_path, args.target_model_path, args.delta_path)