Update README.md
Browse files
README.md
CHANGED
@@ -8,10 +8,9 @@ inference: false
|
|
8 |
|
9 |
# Monarch Mixer-BERT
|
10 |
|
11 |
-
|
12 |
-
This model has been pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
|
13 |
|
14 |
-
Check out our [blog post]() for more on how we trained this model for long sequence.
|
15 |
|
16 |
This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
|
17 |
|
|
|
8 |
|
9 |
# Monarch Mixer-BERT
|
10 |
|
11 |
+
An 80M checkpoint of M2-BERT, pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
|
|
|
12 |
|
13 |
+
Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
|
14 |
|
15 |
This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
|
16 |
|