danfu09 commited on
Commit
3e426fb
1 Parent(s): 3fdab0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -8,10 +8,9 @@ inference: false
8
 
9
  # Monarch Mixer-BERT
10
 
11
- The 80M checkpoint for M2-BERT-base from the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109).
12
- This model has been pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
13
 
14
- Check out our [blog post]() for more on how we trained this model for long sequence.
15
 
16
  This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
17
 
 
8
 
9
  # Monarch Mixer-BERT
10
 
11
+ An 80M checkpoint of M2-BERT, pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
 
12
 
13
+ Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
14
 
15
  This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
16