|
--- |
|
license: apache-2.0 |
|
library_name: peft |
|
datasets: |
|
- OpenAssistant/oasst1 |
|
pipeline_tag: text-generation |
|
base_model: tiiuae/falcon-40b |
|
inference: false |
|
model-index: |
|
- name: falcon-40b-openassistant-peft |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 62.63 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 85.59 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 57.77 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 51.02 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 81.45 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 13.34 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
<img src="./falcon.webp" width="150px"> |
|
|
|
</div> |
|
|
|
# Falcon-40B-Chat-v0.1 |
|
|
|
Falcon-40B-Chat-v0.1 is a chatbot model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset. This repo only includes the LoRA adapters from fine-tuning with 🤗's [peft](https://github.com/huggingface/peft) package. |
|
|
|
## Model Summary |
|
|
|
- **Model Type:** Causal language model (clm) |
|
- **Language(s):** English |
|
- **Base Model:** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) (License: [Apache 2.0](https://huggingface.co/tiiuae/falcon-40b#license)) |
|
- **Dataset:** [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) (License: [Apache 2.0](https://huggingface.co/datasets/OpenAssistant/oasst1/blob/main/LICENSE)) |
|
- **License:** Apache 2.0 inherited from "Base Model" and "Dataset" |
|
|
|
The model was fine-tuned in 4-bit precision using `peft` adapters, `transformers`, and `bitsandbytes`. Training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. The run took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory. See attached [Colab Notebook](https://huggingface.co/dfurman/Falcon-40B-Chat-v0.1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code and hyperparams used to train the model. |
|
|
|
## Model Date |
|
|
|
May 30, 2023 |
|
|
|
## Open LLM Leaderboard Evaluation Results |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__falcon-40b-openassistant-peft) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 51.17 | |
|
| ARC (25-shot) | 62.63 | |
|
| HellaSwag (10-shot) | 85.59 | |
|
| MMLU (5-shot) | 57.77 | |
|
| TruthfulQA (0-shot) | 51.02 | |
|
| Winogrande (5-shot) | 81.45 | |
|
| GSM8K (5-shot) | 13.34 | |
|
| DROP (3-shot) | 6.36 | |
|
|
|
## Quick Start |
|
|
|
To prompt the chat model, use the following format: |
|
|
|
``` |
|
<human>: [Instruction] |
|
<bot>: |
|
``` |
|
|
|
### Example Dialogue 1 |
|
|
|
**Prompter**: |
|
``` |
|
"""<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB. |
|
<bot>:""" |
|
``` |
|
|
|
**Falcon-40B-Chat-v0.1**: |
|
``` |
|
Dear Friends, |
|
|
|
I'm excited to invite you to a dinner party at my home on Friday, [date]. I will be making the food, but please bring your own beverages. |
|
|
|
I'm looking forward to seeing you and catching up! |
|
|
|
Sincerely, |
|
Daniel |
|
``` |
|
|
|
### Example Dialogue 2 |
|
|
|
**Prompter**: |
|
``` |
|
<human>: Create a list of things to do in San Francisco. |
|
<bot>: |
|
``` |
|
|
|
**Falcon-40B-Chat-v0.1**: |
|
``` |
|
Here is a list of things to do in San Francisco: |
|
|
|
1. Visit the Golden Gate Bridge |
|
2. Take a cable car ride |
|
3. Visit Alcatraz Island |
|
4. Explore the Fisherman's Wharf |
|
5. Visit the Exploratorium |
|
6. Go to the San Francisco Zoo |
|
7. Visit the California Academy of Sciences |
|
8. Go to the de Young Museum |
|
9. Visit the San Francisco Museum of Modern Art |
|
10. Go to the Asian Art Museum |
|
11. Visit the Conservatory of Flowers |
|
12. Go to the California Historical Society |
|
13. Visit the Walt Disney Family Museum |
|
14. Go to the San Francisco Botanical Garden |
|
15. Visit the San Francisco Museum of Craft and Design |
|
16. Go to the Cartoon Art Museum |
|
17. Visit the Contemporary Jewish Museum |
|
18. Go to the Museum of the African Diaspora |
|
19. Visit the Museum of the City of San Francisco |
|
``` |
|
|
|
### Direct Use |
|
|
|
This model has been finetuned on conversation trees from [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) and should only be used on data of a similar nature. |
|
|
|
### Out-of-Scope Use |
|
|
|
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
This model is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online. |
|
|
|
### Recommendations |
|
|
|
We recommend users of this model to develop guardrails and to take appropriate precautions for any production use. |
|
|
|
## How to Get Started with the Model |
|
|
|
### Setup |
|
```python |
|
# Install packages |
|
!pip install -q -U bitsandbytes loralib einops |
|
!pip install -q -U git+https://github.com/huggingface/transformers.git |
|
!pip install -q -U git+https://github.com/huggingface/peft.git |
|
!pip install -q -U git+https://github.com/huggingface/accelerate.git |
|
``` |
|
|
|
### GPU Inference in 4-bit |
|
|
|
This requires a GPU with at least 27GB memory. |
|
|
|
### First, Load the Model |
|
|
|
```python |
|
import torch |
|
from peft import PeftModel, PeftConfig |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
|
peft_model_id = "dfurman/Falcon-40B-Chat-v0.1" |
|
config = PeftConfig.from_pretrained(peft_model_id) |
|
|
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.bfloat16 |
|
) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
config.base_model_name_or_path, |
|
return_dict=True, |
|
quantization_config=bnb_config, |
|
device_map={"":0}, |
|
trust_remote_code=True, |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
model = PeftModel.from_pretrained(model, peft_model_id) |
|
``` |
|
|
|
### Next, Run the Model |
|
|
|
```python |
|
prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB. |
|
<bot>:""" |
|
|
|
batch = tokenizer( |
|
prompt, |
|
padding=True, |
|
truncation=True, |
|
return_tensors='pt' |
|
) |
|
batch = batch.to('cuda:0') |
|
|
|
with torch.cuda.amp.autocast(): |
|
output_tokens = model.generate( |
|
inputs=batch.input_ids, |
|
max_new_tokens=200, |
|
do_sample=False, |
|
use_cache=True, |
|
temperature=1.0, |
|
top_k=50, |
|
top_p=1.0, |
|
num_return_sequences=1, |
|
pad_token_id=tokenizer.eos_token_id, |
|
eos_token_id=tokenizer.eos_token_id, |
|
bos_token_id=tokenizer.eos_token_id, |
|
) |
|
|
|
generated_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True) |
|
# Inspect message response in the outputs |
|
print(generated_text.split("<human>: ")[1].split("<bot>: ")[-1]) |
|
``` |
|
|
|
## Reproducibility |
|
|
|
See attached [Colab Notebook](https://huggingface.co/dfurman/Falcon-40B-Chat-v0.1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model. |
|
|
|
### CUDA Info |
|
|
|
- CUDA Version: 12.0 |
|
- Hardware: 1 A100-SXM |
|
- Max Memory: {0: "37GB"} |
|
- Device Map: {"": 0} |
|
|
|
### Package Versions Employed |
|
|
|
- `torch`: 2.0.1+cu118 |
|
- `transformers`: 4.30.0.dev0 |
|
- `peft`: 0.4.0.dev0 |
|
- `accelerate`: 0.19.0 |
|
- `bitsandbytes`: 0.39.0 |
|
- `einops`: 0.6.1 |
|
|
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__falcon-40b-openassistant-peft) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |58.63| |
|
|AI2 Reasoning Challenge (25-Shot)|62.63| |
|
|HellaSwag (10-Shot) |85.59| |
|
|MMLU (5-Shot) |57.77| |
|
|TruthfulQA (0-shot) |51.02| |
|
|Winogrande (5-shot) |81.45| |
|
|GSM8k (5-shot) |13.34| |
|
|
|
|