kartikmosaicml
commited on
Commit
•
cef9067
1
Parent(s):
6e6da7b
Updating Readme and model card.
Browse files
README.md
CHANGED
@@ -30,14 +30,14 @@ MPT-30B-Chat is a chatbot-like model for dialogue generation.
|
|
30 |
It was built by finetuning [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
|
31 |
[GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
|
32 |
* License: _CC-By-NC-SA-4.0_ (non-commercial use only)
|
33 |
-
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-30b-chat)
|
34 |
|
35 |
|
36 |
This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
|
37 |
|
38 |
## Model Date
|
39 |
|
40 |
-
|
41 |
|
42 |
## Model License
|
43 |
|
@@ -45,32 +45,35 @@ _CC-By-NC-SA-4.0_ (non-commercial use only)
|
|
45 |
|
46 |
## Documentation
|
47 |
|
48 |
-
* [Blog post:
|
49 |
* [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
|
50 |
* Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
|
51 |
|
52 |
-
### Example Dialogue
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
-
**SamIAm85**:
|
55 |
-
>I want you to come up with a tweet based on this summary of the article:
|
56 |
-
>"Introducing MPT-7B, the latest entry in our MosaicML Foundation Series.
|
57 |
-
>MPT-7B is a transformer trained from scratch on IT tokens of text and code.
|
58 |
-
>It is open source and it matches the quality of LLaMA-7B.
|
59 |
-
>MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k.
|
60 |
-
>Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch.
|
61 |
-
>For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-Story Writer-65k+, the last of which uses a context length of 65k tokens!
|
62 |
|
63 |
**MPT-30B-Chat**:
|
64 |
-
>
|
65 |
-
>
|
|
|
66 |
|
67 |
-
**
|
68 |
>Is there anything you want to say to MosaicML about this exciting news?
|
69 |
|
70 |
**MPT-30B-Chat**:
|
71 |
-
> As
|
72 |
-
>
|
73 |
-
>
|
74 |
|
75 |
## How to Use
|
76 |
|
@@ -95,8 +98,8 @@ import transformers
|
|
95 |
name = 'mosaicml/mpt-30b-chat'
|
96 |
|
97 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
98 |
-
config.attn_config['attn_impl'] = '
|
99 |
-
config.init_device = 'cuda
|
100 |
|
101 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
102 |
name,
|
@@ -106,7 +109,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
|
|
106 |
)
|
107 |
```
|
108 |
|
109 |
-
|
110 |
|
111 |
```python
|
112 |
import transformers
|
@@ -114,7 +117,7 @@ import transformers
|
|
114 |
name = 'mosaicml/mpt-30b-chat'
|
115 |
|
116 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
117 |
-
config.max_seq_len =
|
118 |
|
119 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
120 |
name,
|
@@ -123,11 +126,11 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
|
|
123 |
)
|
124 |
```
|
125 |
|
126 |
-
This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
|
127 |
|
128 |
```python
|
129 |
from transformers import AutoTokenizer
|
130 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
131 |
```
|
132 |
|
133 |
The model can then be used, for example, within a text-generation pipeline.
|
@@ -136,8 +139,13 @@ Note: when running Torch modules in lower precision, it is best practice to use
|
|
136 |
```python
|
137 |
from transformers import pipeline
|
138 |
|
139 |
-
|
|
|
|
|
|
|
140 |
|
|
|
|
|
141 |
with torch.autocast('cuda', dtype=torch.bfloat16):
|
142 |
print(
|
143 |
pipe('Here is a recipe for vegan banana bread:\n',
|
@@ -167,8 +175,7 @@ The model has been modified from a standard transformer in the following ways:
|
|
167 |
|
168 |
### Training Configuration
|
169 |
|
170 |
-
|
171 |
-
This model was trained on 8 A100-80GBs for about 8.2 hours, followed by training for 6.7 hours on 32 A100-40GBs using the [MosaicML Platform](https://www.mosaicml.com/platform).
|
172 |
The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
|
173 |
|
174 |
## Limitations and Biases
|
@@ -200,11 +207,10 @@ Please cite this model using the following format:
|
|
200 |
```
|
201 |
@online{MosaicML2023Introducing,
|
202 |
author = {MosaicML NLP Team},
|
203 |
-
title = {Introducing MPT-30B:
|
204 |
-
},
|
205 |
year = {2023},
|
206 |
url = {www.mosaicml.com/blog/mpt-30b},
|
207 |
-
note = {Accessed: 2023-
|
208 |
-
urldate = {2023-
|
209 |
}
|
210 |
```
|
|
|
30 |
It was built by finetuning [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
|
31 |
[GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
|
32 |
* License: _CC-By-NC-SA-4.0_ (non-commercial use only)
|
33 |
+
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-30b-chat)
|
34 |
|
35 |
|
36 |
This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
|
37 |
|
38 |
## Model Date
|
39 |
|
40 |
+
June 22, 2023
|
41 |
|
42 |
## Model License
|
43 |
|
|
|
45 |
|
46 |
## Documentation
|
47 |
|
48 |
+
* [Blog post: MPT-30B: Raising the bar for open-source commercial foundation models](https://www.mosaicml.com/blog/mpt-30b)
|
49 |
* [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
|
50 |
* Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
|
51 |
|
52 |
+
### Example Dialogue
|
53 |
+
|
54 |
+
**ksreenivasan**:
|
55 |
+
I want you to come up with a tweet based on this summary of the article:
|
56 |
+
>We are excited to extend the MosaicML Foundation Series with MPT-30B, a new commercially licensed open-source model that is significantly more powerful, and outperforms the
|
57 |
+
>original GPT-3. In addition, we are releasing two fine-tuned models, MPT-30B-Instruct and MPT-30B-Chat, that are built on top of MPT-30B and excel at short-form instruction
|
58 |
+
>following and multi-turn conversations, respectively.
|
59 |
+
>All MPT-30B models come with special features that differentiate them from other LLMs, including an 8k token context window (which can be further extended via finetuning;
|
60 |
+
>see MPT-7B-StoryWriter), support for context-length extrapolation via ALiBi, and efficient inference + training performance via FlashAttention. It also has strong coding
|
61 |
+
>abilities thanks to its pretraining data mixture.
|
62 |
+
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
|
65 |
**MPT-30B-Chat**:
|
66 |
+
>"Revolutionize your AI capabilities with MosaicML's latest addition, MPT-30B! Outperforming GPT-3, this open-source model is significantly more powerful and comes with
|
67 |
+
>finely-tuned models for exceptional instruction following and multi-turn conversations. Unleash the potential of an 8k token context window, ALiBi support, and FlashAttention.
|
68 |
+
>Experience the future of AI with MosaicML! #AI #ML #innovation"
|
69 |
|
70 |
+
**ksreenivasan**:
|
71 |
>Is there anything you want to say to MosaicML about this exciting news?
|
72 |
|
73 |
**MPT-30B-Chat**:
|
74 |
+
> As your AI marketing assistant, I would say that this is truly exciting news for the AI community! MosaicML's latest addition, MPT-30B, is poised to revolutionize the industry with its powerful open-source capabilities and finely-tuned models for instruction following and multi-turn conversations.
|
75 |
+
>The special features, including the 8k token context window, ALiBi support, and FlashAttention, are sure to set MosaicML's MPT-30B apart from other LLMs.
|
76 |
+
>Congratulations to the MosaicML team on this incredible achievement!
|
77 |
|
78 |
## How to Use
|
79 |
|
|
|
98 |
name = 'mosaicml/mpt-30b-chat'
|
99 |
|
100 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
101 |
+
config.attn_config['attn_impl'] = 'torch'
|
102 |
+
config.init_device = 'cuda' # For fast initialization directly on GPU!
|
103 |
|
104 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
105 |
name,
|
|
|
109 |
)
|
110 |
```
|
111 |
|
112 |
+
The model was trained initially with a sequence length of 4096 with an additional pretraining stage for sequence length adapation up to 8192. However, ALiBi enables users to increase the maximum sequence length even further during finetuning and/or inference. For example:
|
113 |
|
114 |
```python
|
115 |
import transformers
|
|
|
117 |
name = 'mosaicml/mpt-30b-chat'
|
118 |
|
119 |
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
120 |
+
config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
|
121 |
|
122 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
123 |
name,
|
|
|
126 |
)
|
127 |
```
|
128 |
|
129 |
+
This model was trained with the MPT-30B tokenizer which is based on the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer and includes additional padding and eos tokens.
|
130 |
|
131 |
```python
|
132 |
from transformers import AutoTokenizer
|
133 |
+
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b')
|
134 |
```
|
135 |
|
136 |
The model can then be used, for example, within a text-generation pipeline.
|
|
|
139 |
```python
|
140 |
from transformers import pipeline
|
141 |
|
142 |
+
with torch.autocast('cuda', dtype=torch.bfloat16):
|
143 |
+
inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
|
144 |
+
outputs = model.generate(**inputs, max_new_tokens=100)
|
145 |
+
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
146 |
|
147 |
+
# or using the HF pipeline
|
148 |
+
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
|
149 |
with torch.autocast('cuda', dtype=torch.bfloat16):
|
150 |
print(
|
151 |
pipe('Here is a recipe for vegan banana bread:\n',
|
|
|
175 |
|
176 |
### Training Configuration
|
177 |
|
178 |
+
This model was trained on 64 H100s for about 7.6 hours using the [MosaicML Platform](https://www.mosaicml.com/platform).
|
|
|
179 |
The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
|
180 |
|
181 |
## Limitations and Biases
|
|
|
207 |
```
|
208 |
@online{MosaicML2023Introducing,
|
209 |
author = {MosaicML NLP Team},
|
210 |
+
title = {Introducing MPT-30B: Raising the bar for open-source commercial foundation models},
|
|
|
211 |
year = {2023},
|
212 |
url = {www.mosaicml.com/blog/mpt-30b},
|
213 |
+
note = {Accessed: 2023-06-22},
|
214 |
+
urldate = {2023-06-22}
|
215 |
}
|
216 |
```
|