mosaicml
/

mpt-30b-chat

@@ -30,14 +30,14 @@ MPT-30B-Chat is a chatbot-like model for dialogue generation.
 It was built by finetuning [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
   * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
-  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-30b-chat) (NOTE: this doesn't exist yet)
 This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
 ## Model Date
-TBD
 ## Model License
@@ -45,32 +45,35 @@ _CC-By-NC-SA-4.0_ (non-commercial use only)
 ## Documentation
-* [Blog post: Introducing MPT-30B: TBD](https://www.mosaicml.com/blog/TBD)
 * [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
 * Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
-### Example Dialogue (TBD)
-**SamIAm85**:
->I want you to come up with a tweet based on this summary of the article:
->"Introducing MPT-7B, the latest entry in our MosaicML Foundation Series.
->MPT-7B is a transformer trained from scratch on IT tokens of text and code.
->It is open source and it matches the quality of LLaMA-7B.
->MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k.
->Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch.
->For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-Story Writer-65k+, the last of which uses a context length of 65k tokens!
 **MPT-30B-Chat**:
->Here's my suggestion for a tweet:
->"Exciting news! @MosaicML has launched #MPT_7B - a new standard for open-source LLMs. With training on 17 tokens of text & code, it matches the quality of LLaMA-7B. Now you can train, fine-tune, and deploy your own private MPT models via their platform. Check out the full blog post here: https://bit.ly/3j6UJzZ"
-**SamIAm85**:
 >Is there anything you want to say to MosaicML about this exciting news?
 **MPT-30B-Chat**:
-> As an Al language model myself, I am very impressed with the advancements made by MosaicML in developing such high-quality open-source LLMs.
-> This new version of MPT-7B is truly impressive and I look forward to seeing what innovative applications developers will create using these powerful tools.
-> Thank you for your hard work and dedication to advancing Al research and development.
 ## How to Use
@@ -95,8 +98,8 @@ import transformers
 name = 'mosaicml/mpt-30b-chat'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
-config.attn_config['attn_impl'] = 'triton'
-config.init_device = 'cuda:0' # For fast initialization directly on GPU!
 model = transformers.AutoModelForCausalLM.from_pretrained(
   name,
@@ -106,7 +109,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
 )
 ```
-Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
 ```python
 import transformers
@@ -114,7 +117,7 @@ import transformers
 name = 'mosaicml/mpt-30b-chat'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
-config.max_seq_len = TBD # (input + output) tokens can now be up to 4096
 model = transformers.AutoModelForCausalLM.from_pretrained(
   name,
@@ -123,11 +126,11 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
 )
 ```
-This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
 ```python
 from transformers import AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 ```
 The model can then be used, for example, within a text-generation pipeline.
@@ -136,8 +139,13 @@ Note: when running Torch modules in lower precision, it is best practice to use
 ```python
 from transformers import pipeline
-pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
 with torch.autocast('cuda', dtype=torch.bfloat16):
     print(
         pipe('Here is a recipe for vegan banana bread:\n',
@@ -167,8 +175,7 @@ The model has been modified from a standard transformer in the following ways:
 ### Training Configuration
-TBD! Ask @sam
-This model was trained on 8 A100-80GBs for about 8.2 hours, followed by training for 6.7 hours on 32 A100-40GBs using the [MosaicML Platform](https://www.mosaicml.com/platform).
 The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
 ## Limitations and Biases
@@ -200,11 +207,10 @@ Please cite this model using the following format:
 ```
 @online{MosaicML2023Introducing,
     author    = {MosaicML NLP Team},
-    title     = {Introducing MPT-30B: TBD,
-},
     year      = {2023},
     url       = {www.mosaicml.com/blog/mpt-30b},
-    note      = {Accessed: 2023-03-28}, % TBD
-    urldate   = {2023-03-28} % TBD
 }
 ```

 It was built by finetuning [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [Camel-AI](https://huggingface.co/camel-ai),
  [GPTeacher](https://github.com/teknium1/GPTeacher), [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), [Baize](https://github.com/project-baize/baize-chatbot) and some generated datasets.
   * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
+  * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-30b-chat)
 This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
 ## Model Date
+June 22, 2023
 ## Model License
 ## Documentation
+* [Blog post: MPT-30B: Raising the bar for open-source commercial foundation models](https://www.mosaicml.com/blog/mpt-30b)
 * [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
 * Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
+### Example Dialogue
+**ksreenivasan**:
+I want you to come up with a tweet based on this summary of the article:
+>We are excited to extend the MosaicML Foundation Series with MPT-30B, a new commercially licensed open-source model that is significantly more powerful, and outperforms the
+>original GPT-3. In addition, we are releasing two fine-tuned models, MPT-30B-Instruct and MPT-30B-Chat, that are built on top of MPT-30B and excel at short-form instruction
+>following and multi-turn conversations, respectively.
+>All MPT-30B models come with special features that differentiate them from other LLMs, including an 8k token context window (which can be further extended via finetuning;
+>see MPT-7B-StoryWriter), support for context-length extrapolation via ALiBi, and efficient inference + training performance via FlashAttention. It also has strong coding
+>abilities thanks to its pretraining data mixture.
 **MPT-30B-Chat**:
+>"Revolutionize your AI capabilities with MosaicML's latest addition, MPT-30B! Outperforming GPT-3, this open-source model is significantly more powerful and comes with
+>finely-tuned models for exceptional instruction following and multi-turn conversations. Unleash the potential of an 8k token context window, ALiBi support, and FlashAttention.
+>Experience the future of AI with MosaicML! #AI #ML #innovation"
+**ksreenivasan**:
 >Is there anything you want to say to MosaicML about this exciting news?
 **MPT-30B-Chat**:
+> As your AI marketing assistant, I would say that this is truly exciting news for the AI community! MosaicML's latest addition, MPT-30B, is poised to revolutionize the industry with its powerful open-source capabilities and finely-tuned models for instruction following and multi-turn conversations.
+>The special features, including the 8k token context window, ALiBi support, and FlashAttention, are sure to set MosaicML's MPT-30B apart from other LLMs.
+>Congratulations to the MosaicML team on this incredible achievement!
 ## How to Use
 name = 'mosaicml/mpt-30b-chat'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
+config.attn_config['attn_impl'] = 'torch'
+config.init_device = 'cuda' # For fast initialization directly on GPU!
 model = transformers.AutoModelForCausalLM.from_pretrained(
   name,
 )
 ```
+The model was trained initially with a sequence length of 4096 with an additional pretraining stage for sequence length adapation up to 8192. However, ALiBi enables users to increase the maximum sequence length even further during finetuning and/or inference. For example:
 ```python
 import transformers
 name = 'mosaicml/mpt-30b-chat'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
+config.max_seq_len = 16384 # (input + output) tokens can now be up to 16384
 model = transformers.AutoModelForCausalLM.from_pretrained(
   name,
 )
 ```
+This model was trained with the MPT-30B tokenizer which is based on the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer and includes additional padding and eos tokens.
 ```python
 from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b')
 ```
 The model can then be used, for example, within a text-generation pipeline.
 ```python
 from transformers import pipeline
+with torch.autocast('cuda', dtype=torch.bfloat16):
+    inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
+    outputs = model.generate(**inputs, max_new_tokens=100)
+    print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
+# or using the HF pipeline
+pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
 with torch.autocast('cuda', dtype=torch.bfloat16):
     print(
         pipe('Here is a recipe for vegan banana bread:\n',
 ### Training Configuration
+This model was trained on 64 H100s for about 7.6 hours using the [MosaicML Platform](https://www.mosaicml.com/platform).
 The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
 ## Limitations and Biases
 ```
 @online{MosaicML2023Introducing,
     author    = {MosaicML NLP Team},
+    title     = {Introducing MPT-30B: Raising the bar for open-source commercial foundation models},
     year      = {2023},
     url       = {www.mosaicml.com/blog/mpt-30b},
+    note      = {Accessed: 2023-06-22},
+    urldate   = {2023-06-22}
 }
 ```