michaelfeil commited on
Commit
ee75241
1 Parent(s): 73a9ff3

Upload OpenAssistant/falcon-7b-sft-top1-696 ctranslate fp16 weights

Browse files
README.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - de
6
+ - es
7
+ - fr
8
+ tags:
9
+ - ctranslate2
10
+ - int8
11
+ - float16
12
+ - sft
13
+ pipeline_tag: text-generation
14
+ widget:
15
+ - text: >-
16
+ <|prompter|>What is a meme, and what's the history behind this
17
+ word?<|endoftext|><|assistant|>
18
+ - text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
19
+ - text: >-
20
+ <|prompter|>Write a story about future of AI
21
+ development<|endoftext|><|assistant|>
22
+ datasets:
23
+ - OpenAssistant/oasst1
24
+ library_name: transformers
25
+ ---
26
+ # # Fast-Inference with Ctranslate2
27
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
28
+
29
+ quantized version of [OpenAssistant/falcon-7b-sft-top1-696](https://huggingface.co/OpenAssistant/falcon-7b-sft-top1-696)
30
+ ```bash
31
+ pip install hf-hub-ctranslate2>=2.10.0 ctranslate2>=3.16.0
32
+ ```
33
+
34
+ ```python
35
+ # from transformers import AutoTokenizer
36
+ model_name = "michaelfeil/ct2fast-falcon-7b-sft-top1-696"
37
+
38
+ from hf_hub_ctranslate2 import GeneratorCT2fromHfHub
39
+ model = GeneratorCT2fromHfHub(
40
+ # load in int8 on CUDA
41
+ model_name_or_path=model_name,
42
+ device="cuda",
43
+ compute_type="int8_float16",
44
+ # tokenizer=AutoTokenizer.from_pretrained("{ORG}/{NAME}")
45
+ )
46
+ outputs = model.generate(
47
+ text=["def fibonnaci(", "User: How are you doing? Bot:"],
48
+ max_length=64,
49
+ include_prompt_in_result=False
50
+ )
51
+ print(outputs)
52
+ ```
53
+
54
+ Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
55
+ and [hf-hub-ctranslate2>=2.10.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
56
+ - `compute_type=int8_float16` for `device="cuda"`
57
+ - `compute_type=int8` for `device="cpu"`
58
+
59
+ Converted on 2023-06-16 using
60
+ ```
61
+ ct2-transformers-converter --model OpenAssistant/falcon-7b-sft-top1-696 --output_dir ~/tmp-ct2fast-falcon-7b-sft-top1-696 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code
62
+ ```
63
+
64
+ # Licence and other remarks:
65
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
66
+
67
+ # Original description
68
+
69
+
70
+ # Open-Assistant Falcon 7B SFT OASST-TOP1 Model
71
+
72
+ This model is a fine-tuning of TII's [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) LLM.
73
+ It was trained with 11,123 top-1 (high-quality) demonstrations of the OASST data set (exported on June 2, 2023) with a batch size of 128 for 8 epochs with LIMA style dropout (p=0.2) and a context-length of 2048 tokens.
74
+
75
+ ## Model Details
76
+
77
+ - **Finetuned from:** [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b)
78
+ - **Model type:** Causal decoder-only transformer language model
79
+ - **Language:** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish);
80
+ - **Weights & Biases:** [Training log](https://wandb.ai/open-assistant/public-sft/runs/25apbcld) (Checkpoint: 696 steps)
81
+ - **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
82
+ - **Demo:** [Continuations for 250 random prompts](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Fchat-gpt%2F2023-04-11_gpt-3.5-turbo_lottery.json%0Ahttps%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-06-05_OpenAssistant_falcon-7b-sft-top1-696_sampling_noprefix2.json)
83
+ - **License:** Apache 2.0
84
+ - **Contact:** [Open-Assistant Discord](https://ykilcher.com/open-assistant-discord)
85
+
86
+
87
+ ## Prompting
88
+
89
+ Two special tokens are used to mark the beginning of user and assistant turns:
90
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
91
+
92
+ Input prompt example:
93
+ ```
94
+ <|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
95
+ ```
96
+ The input ends with the `<|assistant|>` token to signal that the model should
97
+ start generating the assistant reply.
98
+
99
+
100
+ ## Sample Code
101
+
102
+ ```python
103
+ from transformers import AutoTokenizer
104
+ import transformers
105
+ import torch
106
+
107
+ model = "OpenAssistant/falcon-7b-sft-top1-696"
108
+
109
+ tokenizer = AutoTokenizer.from_pretrained(model)
110
+ pipeline = transformers.pipeline(
111
+ "text-generation",
112
+ model=model,
113
+ tokenizer=tokenizer,
114
+ torch_dtype=torch.bfloat16,
115
+ trust_remote_code=True,
116
+ device_map="auto",
117
+ )
118
+
119
+ input_text="<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>"
120
+
121
+ sequences = pipeline(
122
+ input_text,
123
+ max_length=500,
124
+ do_sample=True,
125
+ return_full_text=False,
126
+ top_k=10,
127
+ num_return_sequences=1,
128
+ eos_token_id=tokenizer.eos_token_id,
129
+ )
130
+ for seq in sequences:
131
+ print(f"Result: {seq['generated_text']}")
132
+ ```
133
+
134
+
135
+ ## Configuration Details
136
+
137
+ Model:
138
+ ```
139
+ falcon-7b:
140
+ dtype: bf16
141
+ log_dir: "falcon_log_7b"
142
+ learning_rate: 1e-5
143
+ model_name: "tiiuae/falcon-7b"
144
+ deepspeed_config: configs/zero_config.json
145
+ output_dir: falcon
146
+ weight_decay: 0.0
147
+ max_length: 2048
148
+ save_strategy: steps
149
+ eval_steps: 80
150
+ save_steps: 80
151
+ warmup_steps: 20
152
+ gradient_checkpointing: true
153
+ gradient_accumulation_steps: 4
154
+ per_device_train_batch_size: 4
155
+ per_device_eval_batch_size: 8
156
+ num_train_epochs: 8
157
+ save_total_limit: 4
158
+ residual_dropout: 0.2
159
+ residual_dropout_lima: true
160
+ ```
161
+
162
+ Dataset:
163
+ ```
164
+ oasst-top1:
165
+ # oasst_export: 11123 (100.00%)
166
+ datasets:
167
+ - oasst_export:
168
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
169
+ input_file_path: 2023-06-02_oasst_all_labels.jsonl.gz
170
+ val_split: 0.05
171
+ top_k: 1
172
+ ```
173
+
174
+ Train command:
175
+ ```
176
+ deepspeed trainer_sft.py --configs defaults falcon-7b oasst-top1 --cache_dir <data_cache_dir> --output_dir <output_path> --deepspeed
177
+ ```
178
+
179
+ Export command:
180
+ ```
181
+ python export_model.py --dtype bf16 --hf_repo_name OpenAssistant/falcon-7b-sft-top1 --trust_remote_code --auth_token <auth_token> <output_path> --max_shard_size 2GB
182
+ ```
config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "layer_norm_epsilon": null,
5
+ "unk_token": "<|endoftext|>"
6
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.28.0.dev0"
6
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e95c31f80654d9eea02b3fdf7f1961ea88260040a5b496c80c2ed6e4e66a7ef
3
+ size 6926465766
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>QUESTION<<",
4
+ ">>MIDDLE<<",
5
+ ">>ANSWER<<",
6
+ "<|system|>",
7
+ ">>COMMENT<<",
8
+ ">>TITLE<<",
9
+ "<|assistant|>",
10
+ ">>INTRODUCTION<<",
11
+ "<|prefix_begin|>",
12
+ "<|prefix_end|>",
13
+ ">>PREFIX<<",
14
+ ">>DOMAIN<<",
15
+ ">>SUMMARY<<",
16
+ ">>SUFFIX<<",
17
+ "<|prompter|>",
18
+ ">>ABSTRACT<<"
19
+ ],
20
+ "eos_token": "<|endoftext|>",
21
+ "pad_token": "<|endoftext|>",
22
+ "sep_token": "<|endoftext|>"
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "clean_up_tokenization_spaces": true,
4
+ "eos_token": "<|endoftext|>",
5
+ "model_max_length": 2048,
6
+ "special_tokens_map_file": null,
7
+ "tokenizer_class": "PreTrainedTokenizerFast"
8
+ }
vocabulary.json ADDED
The diff for this file is too large to render. See raw diff