|
--- |
|
language: |
|
- en |
|
license: other |
|
tags: |
|
- causal-lm |
|
datasets: |
|
- HuggingFaceH4/ultrachat_200k |
|
- allenai/ultrafeedback_binarized_cleaned |
|
- meta-math/MetaMathQA |
|
- WizardLM/WizardLM_evol_instruct_V2_196k |
|
- openchat/openchat_sharegpt4_dataset |
|
- LDJnr/Capybara |
|
- Intel/orca_dpo_pairs |
|
- hkust-nlp/deita-10k-v0 |
|
- Anthropic/hh-rlhf |
|
extra_gated_fields: |
|
Name: text |
|
Email: text |
|
Country: text |
|
Organization or Affiliation: text |
|
I ALLOW Stability AI to email me about new model releases: checkbox |
|
--- |
|
# `StableLM 2 12B Chat` |
|
|
|
## Model Description |
|
|
|
`Stable LM 2 12B Chat` is a 12 billion parameter instruction tuned language model trained on a mix of publicly available datasets and synthetic datasets, utilizing [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). |
|
|
|
## Usage |
|
|
|
`StableLM 2 12B Chat` uses the following instruction ChatML format |
|
This format is also available through the tokenizer's `apply_chat_template` method: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('stabilityai/stablelm-2-chat', trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
'stabilityai/stablelm-2-chat', |
|
device_map="auto", |
|
trust_remote_code=True, |
|
) |
|
|
|
prompt = [{'role': 'user', 'content': 'How to achieve multiple rows of data into one row of data in Excel?'}] |
|
inputs = tokenizer.apply_chat_template( |
|
prompt, |
|
add_generation_prompt=True, |
|
return_tensors='pt' |
|
) |
|
|
|
tokens = model.generate( |
|
inputs.to(model.device), |
|
max_new_tokens=100, |
|
temperature=0.7, |
|
do_sample=True |
|
) |
|
output = tokenizer.decode(tokens[:, inputs.input_ids.shape[-1]:][0], skip_special_tokens=False) |
|
|
|
print(output) |
|
``` |
|
|
|
StableLM 2 12B Chat also supports function call usage this is an example how you can use it: |
|
```python |
|
system_prompt_func = """\ |
|
You are a helpful assistant with access to the following functions. You must use them if required -\n |
|
[ |
|
{ |
|
"type": "function", |
|
"function": { |
|
"name": "TextToImage", |
|
"description": "This function able to creating, drawing, or illustrating an image from a text prompt.", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"prompt": { |
|
"type": "string", |
|
"description": "The description of image that user wanto to create." |
|
} |
|
}, |
|
"required": [ |
|
"prompt" |
|
] |
|
} |
|
} |
|
} |
|
] |
|
""" |
|
messages = [ |
|
{'role': 'system', 'content': system_prompt}, |
|
{'role': "user", 'content': "Help me to generate a picture of Eiffel Tower in the night!"} |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
return_tensors='pt' |
|
) |
|
|
|
tokens = model.generate( |
|
inputs.to(model.device), |
|
max_new_tokens=1024, |
|
temperature=0.5, |
|
do_sample=True |
|
) |
|
output = tokenizer.decode(tokens[:, inputs.input_ids.shape[-1]:][0], skip_special_tokens=False) |
|
|
|
print(output) |
|
""" |
|
[ |
|
{ |
|
"name": "TextToImage", |
|
"arguments": { |
|
"prompt": "Eiffel Tower in the night" |
|
} |
|
} |
|
] |
|
""" |
|
|
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
* **Developed by**: [Stability AI](https://stability.ai/) |
|
* **Model type**: `StableLM 2 12B Chat` model is an auto-regressive language model based on the transformer decoder architecture. |
|
* **Language(s)**: English |
|
TODO: Check if we want to keep paper link since it's not mentioned in that paper. |
|
* **Paper**: [Stable LM 2 Chat Technical Report](https://drive.google.com/file/d/1JYJHszhS8EFChTbNAf8xmqhKjogWRrQF/view?usp=sharing) |
|
* **Library**: [Alignment Handbook](https://github.com/huggingface/alignment-handbook.git) |
|
* **Finetuned from model**: |
|
* **License**: [StabilityAI Non-Commercial Research Community License](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b/blob/main/LICENSE). If you want to use this model for your commercial products or purposes, please contact us [here](https://stability.ai/contact) to learn more. |
|
* **Contact**: For questions and comments about the model, please email `lm@stability.ai` |
|
|
|
### Training Dataset |
|
|
|
The dataset is comprised of a mixture of open datasets large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets) as well as an internal safety dataset: |
|
1. SFT Datasets |
|
- HuggingFaceH4/ultrachat_200k |
|
- meta-math/MetaMathQA |
|
- WizardLM/WizardLM_evol_instruct_V2_196k |
|
- Open-Orca/SlimOrca |
|
- openchat/openchat_sharegpt4_dataset |
|
- LDJnr/Capybara |
|
- hkust-nlp/deita-10k-v0 |
|
|
|
2. Safety Datasets: |
|
- Anthropic/hh-rlhf |
|
- Internal Safety Dataset |
|
|
|
3. Preference Datasets: |
|
|
|
|
|
## Performance |
|
|
|
### MT-Bench |
|
|
|
|
|
|
|
### OpenLLM Leaderboard |
|
|
|
|
|
### Training Infrastructure |
|
|
|
TODO: Fix this |
|
* **Hardware**: `StableLM 2 12B Chat` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes. |
|
* **Code Base**: We use our internal script for SFT training and [HuggingFace Alignment Handbook](https://github.com/huggingface/alignment-handbook) for DPO training. |
|
|
|
## Use and Limitations |
|
|
|
### Intended Use |
|
|
|
The model is intended to be used in chat-like applications. Developers must evaluate the model for safety performance in their specific use case. Read more about [safety and limitations](#limitations-and-bias) below. |
|
|
|
### Limitations and Bias |
|
|
|
TODO: Do we need or have a standard template to throw in here now? |
|
|
|
We strongly recommend pairing this model with an input and output classifier to prevent harmful responses. |
|
Using this model will require guardrails around your inputs and outputs to ensure that any outputs returned are not hallucinations. |
|
Additionally, as each use case is unique, we recommend running your own suite of tests to ensure proper performance of this model. |
|
Finally, do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others. |
|
|
|
|
|
## How to Cite |