|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
datasets: |
|
- the_pile_books3 |
|
tags: |
|
- mosaicML |
|
- sharded |
|
- story |
|
--- |
|
|
|
# mpt-7b-storywriter: sharded |
|
|
|
|
|
<a href="https://colab.research.google.com/gist/pszemraj/a979cdcc02edb916661c5dd97cf2294e/mpt-storywriter-sharded-inference.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
This is a version of the [mpt-7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter) model, sharded to 2 GB chunks for low-RAM loading (i.e. Colab). The weights are stored in `bfloat16` so in theory you can run this on CPU, though it may take forever. |
|
|
|
Please refer to the previously linked repo for details on usage/implementation/etc. This model was downloaded from the original repo under Apache-2.0 and is redistributed under the same license. |
|
|
|
|
|
## Basic Usage |
|
|
|
> Note when using: this is **not** an instruction-tuned model, so you need to give it sufficient input text to continue generating something on-topic with your prompt |
|
> |
|
Install/upgrade packages: |
|
|
|
```bash |
|
pip install -U torch transformers accelerate einops |
|
``` |
|
|
|
Load the model: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = 'ethzanalytics/mpt-7b-storywriter-sharded' |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
trust_remote_code=True, |
|
revision='197d14245ad874da82194248cab1ce8cf87fa713', # optional, but a good idea |
|
device_map='auto', |
|
load_in_8bit=False, # install bitsandbytes then set to true for 8-bit |
|
) |
|
model = torch.compile(model) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
``` |
|
|
|
Then you can use `model.generate()` as you would normally - see the notebook for details. |
|
|
|
|
|
--- |