Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference
abhi-mosaic commited on
Commit
1592fc2
1 Parent(s): 4ca9cce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -5
README.md CHANGED
@@ -67,7 +67,10 @@ This model is best used with the MosaicML [llm-foundry repository](https://githu
67
 
68
  ```python
69
  import transformers
70
- model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-chat', trust_remote_code=True)
 
 
 
71
  ```
72
  Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
73
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
@@ -75,19 +78,34 @@ This is because we use a custom `MPT` model architecture that is not yet part of
75
 
76
  To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
77
  ```python
78
- config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b-chat', trust_remote_code=True)
 
 
 
79
  config.attn_config['attn_impl'] = 'triton'
80
 
81
- model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-chat', config=config, torch_dtype=torch.bfloat16, trust_remote_code=True)
 
 
 
 
 
82
  model.to(device='cuda:0')
83
  ```
84
 
85
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
86
 
87
  ```python
88
- config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b-chat', trust_remote_code=True)
 
 
 
89
  config.update({"max_seq_len": 4096})
90
- model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-chat', config=config, trust_remote_code=True)
 
 
 
 
91
  ```
92
 
93
  This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
 
67
 
68
  ```python
69
  import transformers
70
+ model = transformers.AutoModelForCausalLM.from_pretrained(
71
+ 'mosaicml/mpt-7b-chat',
72
+ trust_remote_code=True
73
+ )
74
  ```
75
  Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
76
  This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
 
78
 
79
  To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and move the model to `bfloat16`:
80
  ```python
81
+ config = transformers.AutoConfig.from_pretrained(
82
+ 'mosaicml/mpt-7b-chat',
83
+ trust_remote_code=True
84
+ )
85
  config.attn_config['attn_impl'] = 'triton'
86
 
87
+ model = transformers.AutoModelForCausalLM.from_pretrained(
88
+ 'mosaicml/mpt-7b-chat',
89
+ config=config,
90
+ torch_dtype=torch.bfloat16,
91
+ trust_remote_code=True
92
+ )
93
  model.to(device='cuda:0')
94
  ```
95
 
96
  Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
97
 
98
  ```python
99
+ config = transformers.AutoConfig.from_pretrained(
100
+ 'mosaicml/mpt-7b-chat',
101
+ trust_remote_code=True
102
+ )
103
  config.update({"max_seq_len": 4096})
104
+ model = transformers.AutoModelForCausalLM.from_pretrained(
105
+ 'mosaicml/mpt-7b-chat',
106
+ config=config,
107
+ trust_remote_code=True
108
+ )
109
  ```
110
 
111
  This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.