jacobfulano commited on
Commit
785d4c8
1 Parent(s): ce11e47

Update README

Browse files

[x] mention mosaicbert.github.io
[x] change citation
[x] change code snippet to make things clearer
[x] explain how to use alibi

Files changed (1) hide show
  1. README.md +35 -25
README.md CHANGED
@@ -9,46 +9,57 @@ inference: false
9
 
10
  # MosaicBERT-Base model
11
 
12
- MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
  Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
15
 
 
 
16
  ## Model Date
17
 
18
  March 2023
19
 
20
  ## Documentation
21
 
22
- * [Blog post](https://www.mosaicml.com/blog/mosaicbert)
23
  * [Github (mosaicml/examples/tree/main/examples/benchmarks/bert)](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert)
 
 
 
 
24
 
25
  ## How to use
26
 
27
  ```python
28
- from transformers import AutoModelForMaskedLM
29
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
30
- ```
 
31
 
32
- The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
33
 
34
- ```python
35
- from transformers import BertTokenizer
36
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
37
- ```
38
 
39
- To use this model directly for masked language modeling, use `pipeline`:
 
 
 
40
 
41
- ```python
42
- from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
43
 
44
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
45
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
46
 
47
- classifier = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
 
 
48
 
49
- classifier("I [MASK] to the store yesterday.")
50
  ```
51
 
 
 
52
  **To continue MLM pretraining**, follow the [MLM pre-training section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#pre-training).
53
 
54
  **To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#fine-tuning).
@@ -58,7 +69,7 @@ classifier("I [MASK] to the store yesterday.")
58
  This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example:
59
 
60
  ```python
61
- mlm = AutoModelForMaskedLM.from_pretrained(
62
  'mosaicml/mosaic-bert-base',
63
  trust_remote_code=True,
64
  revision='24512df',
@@ -182,12 +193,11 @@ This model is intended to be finetuned on downstream tasks.
182
  Please cite this model using the following format:
183
 
184
  ```
185
- @online{Portes2023MosaicBERT,
186
- author = {Jacob Portes and Alex Trott and Daniel King and Sam Havens},
187
- title = {MosaicBERT: Pretraining BERT from Scratch for \$20},
188
- year = {2023},
189
- url = {https://www.mosaicml.com/blog/mosaicbert},
190
- note = {Accessed: 2023-03-28}, % change this date
191
- urldate = {2023-03-28} % change this date
192
  }
193
  ```
 
9
 
10
  # MosaicBERT-Base model
11
 
12
+ MosaicBERT-Base is a custom BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
  Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
15
 
16
+ This study motivated many of the architecture choices around MosaicML's [MPT-7B](https://huggingface.co/mosaicml/mpt-7b) and [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) models.
17
+
18
  ## Model Date
19
 
20
  March 2023
21
 
22
  ## Documentation
23
 
24
+ * [Project Page (mosaicbert.github.io)](mosaicbert.github.io)
25
  * [Github (mosaicml/examples/tree/main/examples/benchmarks/bert)](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert)
26
+ * [Paper (NeurIPS 2023)](https://openreview.net/forum?id=5zipcfLC2Z)
27
+ * Colab Tutorials:
28
+ * [MosaicBERT Tutorial Part 1: Load Pretrained Weights and Experiment with Sequence Length Extrapolation Using ALiBi](https://colab.research.google.com/drive/1r0A3QEbu4Nzs2Jl6LaiNoW5EumIVqrGc?usp=sharing)
29
+ * [Blog Post (March 2023)](https://www.mosaicml.com/blog/mosaicbert)
30
 
31
  ## How to use
32
 
33
  ```python
34
+ import torch
35
+ import transformers
36
+ from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
37
+ from transformers import BertTokenizer, BertConfig
38
 
39
+ tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # MosaicBERT uses the standard BERT tokenizer
40
 
41
+ config = transformers.BertConfig.from_pretrained('mosaicml/mosaic-bert-base') # the config needs to be passed in
42
+ mosaicbert = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base',config=config,trust_remote_code=True)
 
 
43
 
44
+ # To use this model directly for masked language modeling
45
+ mosaicbert_classifier = pipeline('fill-mask', model=mosaicbert, tokenizer=tokenizer,device="cpu")
46
+ mosaicbert_classifier("I [MASK] to the store yesterday.")
47
+ ```
48
 
49
+ Note that the tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
 
50
 
51
+ In order to take advantage of ALiBi by extrapolating to longer sequence lengths, simply change the `alibi_starting_size` flag in the
52
+ config file and reload the model.
53
 
54
+ ```python
55
+ config = transformers.BertConfig.from_pretrained('mosaicml/mosaic-bert-base')
56
+ config.alibi_starting_size = 1024 # maximum sequence length updated to 4096
57
 
58
+ mosaicbert = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base',config=config,trust_remote_code=True)
59
  ```
60
 
61
+ This simply presets the non-learned linear bias matrix in every attention block to 1024 tokens (note that this particular model was trained with a sequence length of 128 tokens).
62
+
63
  **To continue MLM pretraining**, follow the [MLM pre-training section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#pre-training).
64
 
65
  **To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#fine-tuning).
 
69
  This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example:
70
 
71
  ```python
72
+ mosaicbert = AutoModelForMaskedLM.from_pretrained(
73
  'mosaicml/mosaic-bert-base',
74
  trust_remote_code=True,
75
  revision='24512df',
 
193
  Please cite this model using the following format:
194
 
195
  ```
196
+ @article{portes2023MosaicBERT,
197
+ title={MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining},
198
+ author={Jacob Portes, Alexander R Trott, Sam Havens, Daniel King, Abhinav Venigalla,
199
+ Moin Nadeem, Nikhil Sardana, Daya Khudia, Jonathan Frankle},
200
+ journal={NeuRIPS https://openreview.net/pdf?id=5zipcfLC2Z},
201
+ year={2023},
 
202
  }
203
  ```