Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference
abhi-mosaic commited on
Commit
cfc57ba
1 Parent(s): 9673b24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -5
README.md CHANGED
@@ -50,7 +50,10 @@ It includes options for many training efficiency features such as [FlashAttentio
50
 
51
  ```python
52
  import transformers
53
- model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-instruct', trust_remote_code=True)
 
 
 
54
  ```
55
  Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
56
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
@@ -58,19 +61,34 @@ This is because we use a custom `MPT` model architecture that is not yet part of
58
 
59
  To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
60
  ```python
61
- config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b-instruct', trust_remote_code=True)
 
 
 
62
  config.attn_config['attn_impl'] = 'triton'
63
 
64
- model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-instruct', config=config, torch_dtype=torch.bfloat16, trust_remote_code=True)
 
 
 
 
 
65
  model.to(device='cuda:0')
66
  ```
67
 
68
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
69
 
70
  ```python
71
- config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b', trust_remote_code=True)
 
 
 
72
  config.update({"max_seq_len": 4096})
73
- model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b', config=config, trust_remote_code=True)
 
 
 
 
74
  ```
75
 
76
  This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
 
50
 
51
  ```python
52
  import transformers
53
+ model = transformers.AutoModelForCausalLM.from_pretrained(
54
+ 'mosaicml/mpt-7b-instruct',
55
+ trust_remote_code=True
56
+ )
57
  ```
58
  Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
59
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
 
61
 
62
  To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
63
  ```python
64
+ config = transformers.AutoConfig.from_pretrained(
65
+ 'mosaicml/mpt-7b-instruct',
66
+ trust_remote_code=True
67
+ )
68
  config.attn_config['attn_impl'] = 'triton'
69
 
70
+ model = transformers.AutoModelForCausalLM.from_pretrained(
71
+ 'mosaicml/mpt-7b-instruct',
72
+ config=config,
73
+ torch_dtype=torch.bfloat16,
74
+ trust_remote_code=True
75
+ )
76
  model.to(device='cuda:0')
77
  ```
78
 
79
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
80
 
81
  ```python
82
+ config = transformers.AutoConfig.from_pretrained(
83
+ 'mosaicml/mpt-7b-instruct',
84
+ trust_remote_code=True
85
+ )
86
  config.update({"max_seq_len": 4096})
87
+ model = transformers.AutoModelForCausalLM.from_pretrained(
88
+ 'mosaicml/mpt-7b-instruct',
89
+ config=config,
90
+ trust_remote_code=True
91
+ )
92
  ```
93
 
94
  This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.