kartikmosaicml commited on
Commit
cef9067
1 Parent(s): 6e6da7b

Updating Readme and model card.

Browse files
Files changed (1) hide show
  1. README.md +37 -31
README.md CHANGED
@@ -30,14 +30,14 @@ MPT-30B-Chat is a chatbot-like model for dialogue generation.
30
  It was built by finetuning [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
31
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
32
  * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
33
- * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-30b-chat) (NOTE: this doesn't exist yet)
34
 
35
 
36
  This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
37
 
38
  ## Model Date
39
 
40
- TBD
41
 
42
  ## Model License
43
 
@@ -45,32 +45,35 @@ _CC-By-NC-SA-4.0_ (non-commercial use only)
45
 
46
  ## Documentation
47
 
48
- * [Blog post: Introducing MPT-30B: TBD](https://www.mosaicml.com/blog/TBD)
49
  * [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
50
  * Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
51
 
52
- ### Example Dialogue (TBD)
 
 
 
 
 
 
 
 
 
 
53
 
54
- **SamIAm85**:
55
- >I want you to come up with a tweet based on this summary of the article:
56
- >"Introducing MPT-7B, the latest entry in our MosaicML Foundation Series.
57
- >MPT-7B is a transformer trained from scratch on IT tokens of text and code.
58
- >It is open source and it matches the quality of LLaMA-7B.
59
- >MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k.
60
- >Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch.
61
- >For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-Story Writer-65k+, the last of which uses a context length of 65k tokens!
62
 
63
  **MPT-30B-Chat**:
64
- >Here's my suggestion for a tweet:
65
- >"Exciting news! @MosaicML has launched #MPT_7B - a new standard for open-source LLMs. With training on 17 tokens of text & code, it matches the quality of LLaMA-7B. Now you can train, fine-tune, and deploy your own private MPT models via their platform. Check out the full blog post here: https://bit.ly/3j6UJzZ"
 
66
 
67
- **SamIAm85**:
68
  >Is there anything you want to say to MosaicML about this exciting news?
69
 
70
  **MPT-30B-Chat**:
71
- > As an Al language model myself, I am very impressed with the advancements made by MosaicML in developing such high-quality open-source LLMs.
72
- > This new version of MPT-7B is truly impressive and I look forward to seeing what innovative applications developers will create using these powerful tools.
73
- > Thank you for your hard work and dedication to advancing Al research and development.
74
 
75
  ## How to Use
76
 
@@ -95,8 +98,8 @@ import transformers
95
  name = 'mosaicml/mpt-30b-chat'
96
 
97
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
98
- config.attn_config['attn_impl'] = 'triton'
99
- config.init_device = 'cuda:0' # For fast initialization directly on GPU!
100
 
101
  model = transformers.AutoModelForCausalLM.from_pretrained(
102
  name,
@@ -106,7 +109,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
106
  )
107
  ```
108
 
109
- Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
110
 
111
  ```python
112
  import transformers
@@ -114,7 +117,7 @@ import transformers
114
  name = 'mosaicml/mpt-30b-chat'
115
 
116
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
117
- config.max_seq_len = TBD # (input + output) tokens can now be up to 4096
118
 
119
  model = transformers.AutoModelForCausalLM.from_pretrained(
120
  name,
@@ -123,11 +126,11 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
123
  )
124
  ```
125
 
126
- This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
127
 
128
  ```python
129
  from transformers import AutoTokenizer
130
- tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
131
  ```
132
 
133
  The model can then be used, for example, within a text-generation pipeline.
@@ -136,8 +139,13 @@ Note: when running Torch modules in lower precision, it is best practice to use
136
  ```python
137
  from transformers import pipeline
138
 
139
- pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
 
 
 
140
 
 
 
141
  with torch.autocast('cuda', dtype=torch.bfloat16):
142
  print(
143
  pipe('Here is a recipe for vegan banana bread:\n',
@@ -167,8 +175,7 @@ The model has been modified from a standard transformer in the following ways:
167
 
168
  ### Training Configuration
169
 
170
- TBD! Ask @sam
171
- This model was trained on 8 A100-80GBs for about 8.2 hours, followed by training for 6.7 hours on 32 A100-40GBs using the [MosaicML Platform](https://www.mosaicml.com/platform).
172
  The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
173
 
174
  ## Limitations and Biases
@@ -200,11 +207,10 @@ Please cite this model using the following format:
200
  ```
201
  @online{MosaicML2023Introducing,
202
  author = {MosaicML NLP Team},
203
- title = {Introducing MPT-30B: TBD,
204
- },
205
  year = {2023},
206
  url = {www.mosaicml.com/blog/mpt-30b},
207
- note = {Accessed: 2023-03-28}, % TBD
208
- urldate = {2023-03-28} % TBD
209
  }
210
  ```
 
30
  It was built by finetuning [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
31
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
32
  * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
33
+ * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-30b-chat)
34
 
35
 
36
  This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
37
 
38
  ## Model Date
39
 
40
+ June 22, 2023
41
 
42
  ## Model License
43
 
 
45
 
46
  ## Documentation
47
 
48
+ * [Blog post: MPT-30B: Raising the bar for open-source commercial foundation models](https://www.mosaicml.com/blog/mpt-30b)
49
  * [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
50
  * Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
51
 
52
+ ### Example Dialogue
53
+
54
+ **ksreenivasan**:
55
+ I want you to come up with a tweet based on this summary of the article:
56
+ >We are excited to extend the MosaicML Foundation Series with MPT-30B, a new commercially licensed open-source model that is significantly more powerful, and outperforms the
57
+ >original GPT-3. In addition, we are releasing two fine-tuned models, MPT-30B-Instruct and MPT-30B-Chat, that are built on top of MPT-30B and excel at short-form instruction
58
+ >following and multi-turn conversations, respectively.
59
+ >All MPT-30B models come with special features that differentiate them from other LLMs, including an 8k token context window (which can be further extended via finetuning;
60
+ >see MPT-7B-StoryWriter), support for context-length extrapolation via ALiBi, and efficient inference + training performance via FlashAttention. It also has strong coding
61
+ >abilities thanks to its pretraining data mixture.
62
+
63
 
 
 
 
 
 
 
 
 
64
 
65
  **MPT-30B-Chat**:
66
+ >"Revolutionize your AI capabilities with MosaicML's latest addition, MPT-30B! Outperforming GPT-3, this open-source model is significantly more powerful and comes with
67
+ >finely-tuned models for exceptional instruction following and multi-turn conversations. Unleash the potential of an 8k token context window, ALiBi support, and FlashAttention.
68
+ >Experience the future of AI with MosaicML! #AI #ML #innovation"
69
 
70
+ **ksreenivasan**:
71
  >Is there anything you want to say to MosaicML about this exciting news?
72
 
73
  **MPT-30B-Chat**:
74
+ > As your AI marketing assistant, I would say that this is truly exciting news for the AI community! MosaicML's latest addition, MPT-30B, is poised to revolutionize the industry with its powerful open-source capabilities and finely-tuned models for instruction following and multi-turn conversations.
75
+ >The special features, including the 8k token context window, ALiBi support, and FlashAttention, are sure to set MosaicML's MPT-30B apart from other LLMs.
76
+ >Congratulations to the MosaicML team on this incredible achievement!
77
 
78
  ## How to Use
79
 
 
98
  name = 'mosaicml/mpt-30b-chat'
99
 
100
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
101
+ config.attn_config['attn_impl'] = 'torch'
102
+ config.init_device = 'cuda' # For fast initialization directly on GPU!
103
 
104
  model = transformers.AutoModelForCausalLM.from_pretrained(
105
  name,
 
109
  )
110
  ```
111
 
112
+ The model was trained initially with a sequence length of 4096 with an additional pretraining stage for sequence length adapation up to 8192. However, ALiBi enables users to increase the maximum sequence length even further during finetuning and/or inference. For example:
113
 
114
  ```python
115
  import transformers
 
117
  name = 'mosaicml/mpt-30b-chat'
118
 
119
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
120
+ config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
121
 
122
  model = transformers.AutoModelForCausalLM.from_pretrained(
123
  name,
 
126
  )
127
  ```
128
 
129
+ This model was trained with the MPT-30B tokenizer which is based on the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer and includes additional padding and eos tokens.
130
 
131
  ```python
132
  from transformers import AutoTokenizer
133
+ tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b')
134
  ```
135
 
136
  The model can then be used, for example, within a text-generation pipeline.
 
139
  ```python
140
  from transformers import pipeline
141
 
142
+ with torch.autocast('cuda', dtype=torch.bfloat16):
143
+ inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
144
+ outputs = model.generate(**inputs, max_new_tokens=100)
145
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
146
 
147
+ # or using the HF pipeline
148
+ pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
149
  with torch.autocast('cuda', dtype=torch.bfloat16):
150
  print(
151
  pipe('Here is a recipe for vegan banana bread:\n',
 
175
 
176
  ### Training Configuration
177
 
178
+ This model was trained on 64 H100s for about 7.6 hours using the [MosaicML Platform](https://www.mosaicml.com/platform).
 
179
  The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
180
 
181
  ## Limitations and Biases
 
207
  ```
208
  @online{MosaicML2023Introducing,
209
  author = {MosaicML NLP Team},
210
+ title = {Introducing MPT-30B: Raising the bar for open-source commercial foundation models},
 
211
  year = {2023},
212
  url = {www.mosaicml.com/blog/mpt-30b},
213
+ note = {Accessed: 2023-06-22},
214
+ urldate = {2023-06-22}
215
  }
216
  ```