Text Generation
PEFT
Safetensors
dfurman commited on
Commit
b679fa8
1 Parent(s): fb3eaed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -37
README.md CHANGED
@@ -7,10 +7,10 @@ pipeline_tag: text-generation
7
  # Falcon-7b-chat-oasst1
8
 
9
  Falcon-7b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
10
- This model was fine-tuned in 8-bit using 🤗 [peft](https://github.com/huggingface/peft) adapters, [transformers](https://github.com/huggingface/transformers), and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
11
- - The training relied on a recent method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)). Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
12
- - Training took approximately 6.25 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
13
- - See attached [Notebook](https://huggingface.co/dfurman/falcon-7b-chat-oasst1/blob/main/finetune_falcon7b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
14
 
15
  ## Model Summary
16
 
@@ -43,14 +43,7 @@ To prompt the chat model, use the following format:
43
 
44
  **Falcon-7b-chat-oasst1**:
45
  ```
46
- Dear friends,
47
-
48
- I am so excited to host a dinner party at my home this Friday! I will be making a delicious meal, but I would love for you to bring your favorite bottle of wine to share with everyone.
49
-
50
- Please let me know if you can make it and if you have any dietary restrictions I should be aware of. I look forward to seeing you soon!
51
-
52
- Best,
53
- Daniel
54
  ```
55
 
56
  ### Example Dialogue 2
@@ -63,20 +56,9 @@ Daniel
63
 
64
  **Falcon-7b-chat-oasst1**:
65
  ```
66
- Here are four things to do in San Francisco:
67
-
68
- 1. Visit the Golden Gate Bridge: The Golden Gate Bridge is one of the most iconic landmarks in the world and is a must-see for any visitor to San Francisco. The bridge spans 1.7 miles and offers stunning views of the city and the Pacific Ocean.
69
-
70
- 2. Explore Chinatown: San Francisco's Chinatown is one of the largest Chinatowns in the world and is a great place to experience the culture and history of the Chinese community in the city. The area is full of shops, restaurants, and cultural attractions.
71
-
72
- 3. Visit Alcatraz Island: Alcatraz Island is a former prison and now a national park. The island is home to a variety of wildlife and offers stunning views of the San Francisco Bay.
73
-
74
- 4. Take a cable car ride: San Francisco's cable cars are a classic way to get around the city and offer a unique experience. The cars run on a cable system that was first installed in 1873 and is still in use today.
75
-
76
- These are just a few of the many things to do in San Francisco. For more ideas, check out the official tourism website for the city.
77
  ```
78
 
79
-
80
  ### Direct Use
81
 
82
  This model has been finetuned on conversation trees from [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) and should only be used on data of a similar nature.
@@ -97,30 +79,37 @@ We recommend users of this model to develop guardrails and to take appropriate p
97
 
98
  ### Setup
99
  ```python
100
- # Install and import packages
101
  !pip install -q -U bitsandbytes loralib einops
102
  !pip install -q -U git+https://github.com/huggingface/transformers.git
103
  !pip install -q -U git+https://github.com/huggingface/peft.git
104
  !pip install -q -U git+https://github.com/huggingface/accelerate.git
105
-
106
- import torch
107
- from peft import PeftModel, PeftConfig
108
- from transformers import AutoModelForCausalLM, AutoTokenizer
109
  ```
110
 
111
- ### GPU Inference in 8-bit
112
 
113
- This requires a GPU with at least 12GB memory.
114
 
115
  ```python
 
 
 
 
116
  # load the model
117
  peft_model_id = "dfurman/falcon-7b-chat-oasst1"
118
  config = PeftConfig.from_pretrained(peft_model_id)
119
 
 
 
 
 
 
 
 
120
  model = AutoModelForCausalLM.from_pretrained(
121
- config.base_model_name_or_path,
122
- return_dict=True,
123
- load_in_8bit=True,
124
  device_map={"":0},
125
  trust_remote_code=True,
126
  )
@@ -129,9 +118,7 @@ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
129
  tokenizer.pad_token = tokenizer.eos_token
130
 
131
  model = PeftModel.from_pretrained(model, peft_model_id)
132
- ```
133
 
134
- ```python
135
  # run the model
136
  prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
137
  <bot>:"""
@@ -161,7 +148,7 @@ print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))
161
 
162
  ## Reproducibility
163
 
164
- - See attached [Notebook](https://huggingface.co/dfurman/falcon-7b-chat-oasst1/blob/main/finetune_falcon7b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
165
 
166
  ### CUDA Info
167
 
 
7
  # Falcon-7b-chat-oasst1
8
 
9
  Falcon-7b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
10
+ - The model was fine-tuned in 4-bit precision using `peft`, `transformers`, and `bitsandbytes`.
11
+ - The training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
12
+ - Fine-tuning took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU, with 37 GB of available memory
13
+ - See attached [Colab Notebook](https://huggingface.co/dfurman/falcon-7b-chat-oasst1/blob/main/finetune_falcon7b_oasst1_with_bnb_peft.ipynb) for the code and hyperparams used to train the model.
14
 
15
  ## Model Summary
16
 
 
43
 
44
  **Falcon-7b-chat-oasst1**:
45
  ```
46
+ [coming]
 
 
 
 
 
 
 
47
  ```
48
 
49
  ### Example Dialogue 2
 
56
 
57
  **Falcon-7b-chat-oasst1**:
58
  ```
59
+ [coming]
 
 
 
 
 
 
 
 
 
 
60
  ```
61
 
 
62
  ### Direct Use
63
 
64
  This model has been finetuned on conversation trees from [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) and should only be used on data of a similar nature.
 
79
 
80
  ### Setup
81
  ```python
82
+ # Install packages
83
  !pip install -q -U bitsandbytes loralib einops
84
  !pip install -q -U git+https://github.com/huggingface/transformers.git
85
  !pip install -q -U git+https://github.com/huggingface/peft.git
86
  !pip install -q -U git+https://github.com/huggingface/accelerate.git
 
 
 
 
87
  ```
88
 
89
+ ### GPU Inference in 4-bit
90
 
91
+ This requires a GPU with at least XXGB of memory.
92
 
93
  ```python
94
+ import torch
95
+ from peft import PeftModel, PeftConfig
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
+
98
  # load the model
99
  peft_model_id = "dfurman/falcon-7b-chat-oasst1"
100
  config = PeftConfig.from_pretrained(peft_model_id)
101
 
102
+ bnb_config = BitsAndBytesConfig(
103
+ load_in_4bit=True,
104
+ bnb_4bit_use_double_quant=True,
105
+ bnb_4bit_quant_type="nf4",
106
+ bnb_4bit_compute_dtype=torch.bfloat16
107
+ )
108
+
109
  model = AutoModelForCausalLM.from_pretrained(
110
+ config.base_model_name_or_path,
111
+ return_dict=True,
112
+ quantization_config=bnb_config,
113
  device_map={"":0},
114
  trust_remote_code=True,
115
  )
 
118
  tokenizer.pad_token = tokenizer.eos_token
119
 
120
  model = PeftModel.from_pretrained(model, peft_model_id)
 
121
 
 
122
  # run the model
123
  prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
124
  <bot>:"""
 
148
 
149
  ## Reproducibility
150
 
151
+ - See attached [Colab Notebook](https://huggingface.co/dfurman/falcon-7b-chat-oasst1/blob/main/finetune_falcon7b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
152
 
153
  ### CUDA Info
154