Add citation, delete amp
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
|
5 |
**English** | [中文](./README_zh.md)
|
6 |
|
7 |
-
## Code implementation of new GTE
|
8 |
|
9 |
This model is a BERT-like encoder with the following optimizations implemented:
|
10 |
|
@@ -12,7 +12,6 @@ This model is a BERT-like encoder with the following optimizations implemented:
|
|
12 |
2. Substituting the conventional activation functions with Gated Linear Units (GLU) [^2].
|
13 |
3. Setting attention dropout to 0 to use `xformers` and `flash_attn`.
|
14 |
4. Using unpadding to eliminate the needless computations for padding tokens [^3]. (this is off by default and should be used in conjunction with `xformers` for optimal acceleration).
|
15 |
-
5. Setting `vocab_size` as a multiple of 64.
|
16 |
|
17 |
### Recommendation: Enable Unpadding and Acceleration with `xformers`
|
18 |
|
@@ -31,7 +30,8 @@ elif pytorch is installed using pip:
|
|
31 |
```
|
32 |
For more information, refer to [Installing xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers).
|
33 |
|
34 |
-
Then, when loading the model, set `unpad_inputs` and `use_memory_efficient_attention` to `true`,
|
|
|
35 |
|
36 |
```python
|
37 |
import torch
|
@@ -45,15 +45,18 @@ model = AutoModel.from_pretrained(
|
|
45 |
trust_remote_code=True,
|
46 |
unpad_inputs=True,
|
47 |
use_memory_efficient_attention=True,
|
|
|
48 |
).to(device)
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
|
|
|
53 |
|
54 |
```
|
55 |
|
56 |
-
Alternatively, you can directly modify the `unpad_inputs` and `use_memory_efficient_attention` settings to `true` in the model's `config.json`,
|
|
|
57 |
|
58 |
|
59 |
---
|
@@ -73,6 +76,20 @@ Without the outstanding work of `nomicai`, the release of `gte-v1.5` could have
|
|
73 |
|
74 |
---
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
[^1]: Su, Jianlin, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. "Roformer: Enhanced transformer with rotary position embedding." Neurocomputing 568 (2024): 127063.
|
77 |
|
78 |
[^2]: Shazeer, Noam. "Glu variants improve transformer." arXiv preprint arXiv:2002.05202 (2020).
|
|
|
4 |
|
5 |
**English** | [中文](./README_zh.md)
|
6 |
|
7 |
+
## Code implementation of new GTE encoders
|
8 |
|
9 |
This model is a BERT-like encoder with the following optimizations implemented:
|
10 |
|
|
|
12 |
2. Substituting the conventional activation functions with Gated Linear Units (GLU) [^2].
|
13 |
3. Setting attention dropout to 0 to use `xformers` and `flash_attn`.
|
14 |
4. Using unpadding to eliminate the needless computations for padding tokens [^3]. (this is off by default and should be used in conjunction with `xformers` for optimal acceleration).
|
|
|
15 |
|
16 |
### Recommendation: Enable Unpadding and Acceleration with `xformers`
|
17 |
|
|
|
30 |
```
|
31 |
For more information, refer to [Installing xformers](https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers).
|
32 |
|
33 |
+
Then, when loading the model, set `unpad_inputs` and `use_memory_efficient_attention` to `true`,
|
34 |
+
and set `torch_dtype` to `torch.float16` (or `torch.bfloat16`) to achieve the acceleration.
|
35 |
|
36 |
```python
|
37 |
import torch
|
|
|
45 |
trust_remote_code=True,
|
46 |
unpad_inputs=True,
|
47 |
use_memory_efficient_attention=True,
|
48 |
+
torch_dtype=torch.float16
|
49 |
).to(device)
|
50 |
|
51 |
+
inputs = tokenzier(['test input'], truncation=True, max_length=8192, padding=True, return_tensors='pt')
|
52 |
+
|
53 |
+
with torch.inference_mode():
|
54 |
+
outputs = model(**inputs.to(device))
|
55 |
|
56 |
```
|
57 |
|
58 |
+
Alternatively, you can directly modify the `unpad_inputs` and `use_memory_efficient_attention` settings to `true` in the model's `config.json`,
|
59 |
+
eliminating the need to set them in the code.
|
60 |
|
61 |
|
62 |
---
|
|
|
76 |
|
77 |
---
|
78 |
|
79 |
+
## Citation
|
80 |
+
```
|
81 |
+
@misc{zhang2024mgte,
|
82 |
+
title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
|
83 |
+
author={Xin Zhang and Yanzhao Zhang and Dingkun Long and Wen Xie and Ziqi Dai and Jialong Tang and Huan Lin and Baosong Yang and Pengjun Xie and Fei Huang and Meishan Zhang and Wenjie Li and Min Zhang},
|
84 |
+
year={2024},
|
85 |
+
eprint={2407.19669},
|
86 |
+
archivePrefix={arXiv},
|
87 |
+
primaryClass={cs.CL},
|
88 |
+
url={https://arxiv.org/abs/2407.19669},
|
89 |
+
}
|
90 |
+
```
|
91 |
+
|
92 |
+
|
93 |
[^1]: Su, Jianlin, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. "Roformer: Enhanced transformer with rotary position embedding." Neurocomputing 568 (2024): 127063.
|
94 |
|
95 |
[^2]: Shazeer, Noam. "Glu variants improve transformer." arXiv preprint arXiv:2002.05202 (2020).
|