Text Generation
PEFT
Safetensors
Eval Results
dfurman commited on
Commit
8c13c67
1 Parent(s): 04106c5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +165 -0
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - OpenAssistant/oasst1
4
+ pipeline_tag: text-generation
5
+ ---
6
+
7
+ # Falcon-40b-chat-oasst1
8
+
9
+ Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
10
+ This model was fine-tuned in 4-bit using 🤗 [peft](https://github.com/huggingface/peft) adapters, [transformers](https://github.com/huggingface/transformers), and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
11
+ - The training relied on a recent method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), instead of fine-tuning the entire model you just have to fine-tune adapters and load them properly inside the model.
12
+ - Training took approximately 10 hours and was executed on a workstation with a single NVIDIA A100-SXM 40GB GPU (via Google Colab).
13
+ - See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
14
+
15
+ ## Model Summary
16
+
17
+ - **Model Type:** Causal decoder-only
18
+ - **Language(s) (NLP):** English (primarily)
19
+ - **Base Model:** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) (License: [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-40b#license), commercial use ok-ed)
20
+ - **Dataset:** [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) (License: [Apache 2.0](https://huggingface.co/datasets/OpenAssistant/oasst1/blob/main/LICENSE), commercial use ok-ed)
21
+
22
+ ### Model Date
23
+
24
+ May 30, 2023
25
+
26
+ ## Quick Start
27
+
28
+ To prompt the chat model, use the following format:
29
+
30
+ ```
31
+ <human>: [Instruction]
32
+ <bot>:
33
+ ```
34
+
35
+ ### Example Dialogue
36
+
37
+ **Prompter**:
38
+
39
+ ```
40
+ """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
41
+ <bot>:"""
42
+ ```
43
+
44
+ **Falcon-40b-chat-oasst1**:
45
+
46
+ >Coming
47
+
48
+ **Prompter**:
49
+ ```
50
+ <human>: Create a list of things to do in San Francisco.\n
51
+ <bot>:
52
+ ```
53
+
54
+ **Falcon-40b-chat-oasst1**:
55
+ >Coming
56
+
57
+ ### Direct Use
58
+
59
+ This model has been finetuned on conversation trees from [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) and should only be used on data of a similar nature.
60
+
61
+ ### Out-of-Scope Use
62
+
63
+ Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
64
+
65
+ ## Bias, Risks, and Limitations
66
+
67
+ This model is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
68
+
69
+ ### Recommendations
70
+
71
+ We recommend users of this model to develop guardrails and to take appropriate precautions for any production use.
72
+
73
+ ## How to Get Started with the Model
74
+
75
+ ### Setup
76
+ ```python
77
+ # Install and import packages
78
+ !pip install -q -U bitsandbytes loralib einops
79
+ !pip install -q -U git+https://github.com/huggingface/transformers.git
80
+ !pip install -q -U git+https://github.com/huggingface/peft.git
81
+ !pip install -q -U git+https://github.com/huggingface/accelerate.git
82
+
83
+ import torch
84
+ from peft import PeftModel, PeftConfig
85
+ from transformers import AutoModelForCausalLM, AutoTokenizer
86
+ ```
87
+
88
+ ### GPU Inference in 4-bit
89
+
90
+ This requires a GPU with at least 27GB memory.
91
+
92
+ ```python
93
+ # load the model
94
+ peft_model_id = "dfurman/falcon-40b-chat-oasst1"
95
+ config = PeftConfig.from_pretrained(peft_model_id)
96
+
97
+ bnb_config = BitsAndBytesConfig(
98
+ load_in_4bit=True,
99
+ bnb_4bit_use_double_quant=True,
100
+ bnb_4bit_quant_type="nf4",
101
+ bnb_4bit_compute_dtype=torch.bfloat16
102
+ )
103
+
104
+ model = AutoModelForCausalLM.from_pretrained(
105
+ config.base_model_name_or_path,
106
+ return_dict=True,
107
+ quantization_config=bnb_config,
108
+ device_map={"":0},
109
+ use_auth_token=True,
110
+ trust_remote_code=True,
111
+ )
112
+
113
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
114
+ tokenizer.pad_token = tokenizer.eos_token
115
+
116
+ model = PeftModel.from_pretrained(model, peft_model_id)
117
+ ```
118
+
119
+ ```python
120
+ # run the model
121
+ prompt = """<human>: My name is Daniel. Write a long email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
122
+ <bot>:"""
123
+
124
+ batch = tokenizer(
125
+ prompt,
126
+ padding=True,
127
+ truncation=True,
128
+ return_tensors='pt'
129
+ )
130
+ batch = batch.to('cuda:0')
131
+
132
+ with torch.cuda.amp.autocast():
133
+ output_tokens = model.generate(
134
+ input_ids = batch.input_ids,
135
+ max_new_tokens=200,
136
+ temperature=0.7,
137
+ top_p=0.7,
138
+ num_return_sequences=1,
139
+ pad_token_id=tokenizer.eos_token_id,
140
+ eos_token_id=tokenizer.eos_token_id,
141
+ )
142
+
143
+ # Inspect outputs
144
+ print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))
145
+ ```
146
+
147
+ ## Reproducibility
148
+
149
+ - See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
150
+
151
+ ### CUDA Info
152
+
153
+ - CUDA Version: 12.0
154
+ - GPU Name: NVIDIA A100-SXM
155
+ - Max Memory: {0: "37GB"}
156
+ - Device Map: {"": 0}
157
+
158
+ ### Package Versions Employed
159
+
160
+ - `torch`==2.0.1+cu118
161
+ - `transformers`==4.30.0.dev0
162
+ - `peft`==0.4.0.dev0
163
+ - `accelerate`==0.19.0
164
+ - `bitsandbytes`==0.39.0
165
+ - `einops`==0.6.1