mosaicml
/

mpt-30b

@@ -19,9 +19,9 @@ inference: false
 MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
 This model was trained by [MosaicML](https://www.mosaicml.com).
-MPT-30B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
-MPT-30B comes with special features that differentiate them from other LLMs, including an 8k token context window (which can be further extended via finetuning; see [MPT-7B-StoryWriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)), support for context-length extrapolation via [ALiBi](https://arxiv.org/abs/2108.12409), and efficient inference + training performance via FlashAttention. It also has strong coding abilities thanks to its pretraining mix. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer).
 The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision.
 This model uses the MosaicML LLM codebase, which can be found in the [llm-foundry repository](https://github.com/mosaicml/llm-foundry). It was trained by MosaicML’s NLP team on the [MosaicML platform](https://www.mosaicml.com/training) for LLM pretraining, finetuning, and inference.
@@ -32,7 +32,7 @@ This model uses the MosaicML LLM codebase, which can be found in the [llm-foundr
 MPT-30B is:
 * **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
 * **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
-* **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409) (TODO: talk about MPT-30B-instruct finetuned on 8k).
 * **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
 * **Equipped with highly efficient open-source training code** via the [llm-foundry repository](https://github.com/mosaicml/llm-foundry)
@@ -90,7 +90,6 @@ name = 'mosaicml/mpt-30b'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 config.attn_config['attn_impl'] = 'torch'  # change this to use triton
-config.init_device = 'cpu' # For fast initialization directly on GPU if you have enough memory
 model = transformers.AutoModelForCausalLM.from_pretrained(
   name,

 MPT-30B is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code.
 This model was trained by [MosaicML](https://www.mosaicml.com).
+MPT-30B is part of the family of Mosaic Pretrained Transformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
+MPT-30B comes with special features that differentiate it from other LLMs, including an 8k token context window (which can be further extended via finetuning; see [MPT-7B-StoryWriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)), support for context-length extrapolation via [ALiBi](https://arxiv.org/abs/2108.12409), and efficient inference + training via FlashAttention. It also has strong coding abilities thanks to its pretraining mix. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer).
 The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision.
 This model uses the MosaicML LLM codebase, which can be found in the [llm-foundry repository](https://github.com/mosaicml/llm-foundry). It was trained by MosaicML’s NLP team on the [MosaicML platform](https://www.mosaicml.com/training) for LLM pretraining, finetuning, and inference.
 MPT-30B is:
 * **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
 * **Trained on a large amount of data** (1T tokens like [LLaMA](https://arxiv.org/abs/2302.13971) vs. 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
+* **Prepared to handle extremely long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409).
 * **Capable of fast training and inference** (via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer))
 * **Equipped with highly efficient open-source training code** via the [llm-foundry repository](https://github.com/mosaicml/llm-foundry)
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 config.attn_config['attn_impl'] = 'torch'  # change this to use triton
 model = transformers.AutoModelForCausalLM.from_pretrained(
   name,