Text Generation
PEFT
Safetensors
Eval Results
dfurman commited on
Commit
85824ac
1 Parent(s): 5f3b0c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -7,8 +7,8 @@ pipeline_tag: text-generation
7
  # Falcon-40b-chat-oasst1
8
 
9
  Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
10
- This model was fine-tuned in 4-bit using 🤗 [peft](https://github.com/huggingface/peft) adapters, [transformers](https://github.com/huggingface/transformers), and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
11
- - The training relied on a recent method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
12
  - Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
13
  - See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
14
 
@@ -94,15 +94,11 @@ We recommend users of this model to develop guardrails and to take appropriate p
94
 
95
  ### Setup
96
  ```python
97
- # Install and import packages
98
  !pip install -q -U bitsandbytes loralib einops
99
  !pip install -q -U git+https://github.com/huggingface/transformers.git
100
  !pip install -q -U git+https://github.com/huggingface/peft.git
101
  !pip install -q -U git+https://github.com/huggingface/accelerate.git
102
-
103
- import torch
104
- from peft import PeftModel, PeftConfig
105
- from transformers import AutoModelForCausalLM, AutoTokenizer
106
  ```
107
 
108
  ### GPU Inference in 4-bit
@@ -110,6 +106,10 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
110
  This requires a GPU with at least 27GB memory.
111
 
112
  ```python
 
 
 
 
113
  # load the model
114
  peft_model_id = "dfurman/falcon-40b-chat-oasst1"
115
  config = PeftConfig.from_pretrained(peft_model_id)
@@ -133,9 +133,7 @@ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
133
  tokenizer.pad_token = tokenizer.eos_token
134
 
135
  model = PeftModel.from_pretrained(model, peft_model_id)
136
- ```
137
 
138
- ```python
139
  # run the model
140
  prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
141
  <bot>:"""
 
7
  # Falcon-40b-chat-oasst1
8
 
9
  Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
10
+ - The model was fine-tuned in 4-bit precision using `peft`, `transformers`, and `bitsandbytes`.
11
+ - The training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
12
  - Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
13
  - See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
14
 
 
94
 
95
  ### Setup
96
  ```python
97
+ # Install packages
98
  !pip install -q -U bitsandbytes loralib einops
99
  !pip install -q -U git+https://github.com/huggingface/transformers.git
100
  !pip install -q -U git+https://github.com/huggingface/peft.git
101
  !pip install -q -U git+https://github.com/huggingface/accelerate.git
 
 
 
 
102
  ```
103
 
104
  ### GPU Inference in 4-bit
 
106
  This requires a GPU with at least 27GB memory.
107
 
108
  ```python
109
+ import torch
110
+ from peft import PeftModel, PeftConfig
111
+ from transformers import AutoModelForCausalLM, AutoTokenizer
112
+
113
  # load the model
114
  peft_model_id = "dfurman/falcon-40b-chat-oasst1"
115
  config = PeftConfig.from_pretrained(peft_model_id)
 
133
  tokenizer.pad_token = tokenizer.eos_token
134
 
135
  model = PeftModel.from_pretrained(model, peft_model_id)
 
136
 
 
137
  # run the model
138
  prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
139
  <bot>:"""