kartikmosaicml
commited on
Commit
•
2994d94
1
Parent(s):
1db8429
Updating readme with additional details
Browse files
README.md
CHANGED
@@ -22,9 +22,11 @@ This model was trained by [MosaicML](https://www.mosaicml.com).
|
|
22 |
MPT-30B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
|
23 |
|
24 |
MPT-30B comes with special features that differentiate them from other LLMs, including an 8k token context window (which can be further extended via finetuning; see [MPT-7B-StoryWriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)), support for context-length extrapolation via [ALiBi](https://arxiv.org/abs/2108.12409), and efficient inference + training performance via FlashAttention. It also has strong coding abilities thanks to its pretraining mix. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer).
|
|
|
25 |
|
26 |
This model uses the MosaicML LLM codebase, which can be found in the [llm-foundry repository](https://github.com/mosaicml/llm-foundry). It was trained by MosaicML’s NLP team on the [MosaicML platform](https://www.mosaicml.com/training) for LLM pretraining, finetuning, and inference.
|
27 |
|
|
|
28 |
### How is this model different?
|
29 |
|
30 |
MPT-30B is:
|
|
|
22 |
MPT-30B is part of the family of MosaicPretrainedTransformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
|
23 |
|
24 |
MPT-30B comes with special features that differentiate them from other LLMs, including an 8k token context window (which can be further extended via finetuning; see [MPT-7B-StoryWriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)), support for context-length extrapolation via [ALiBi](https://arxiv.org/abs/2108.12409), and efficient inference + training performance via FlashAttention. It also has strong coding abilities thanks to its pretraining mix. MPT models can also be served efficiently with both standard HuggingFace pipelines and NVIDIA's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer).
|
25 |
+
The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision.
|
26 |
|
27 |
This model uses the MosaicML LLM codebase, which can be found in the [llm-foundry repository](https://github.com/mosaicml/llm-foundry). It was trained by MosaicML’s NLP team on the [MosaicML platform](https://www.mosaicml.com/training) for LLM pretraining, finetuning, and inference.
|
28 |
|
29 |
+
|
30 |
### How is this model different?
|
31 |
|
32 |
MPT-30B is:
|