Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,33 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
## VHHBERT
|
6 |
+
|
7 |
+
VHHBERT is a RoBERTa-based model pre-trained on two million VHH sequences in [VHHCorpus-2M](https://huggingface.co/datasets/COGNANO/VHHCorpus-2M).
|
8 |
+
VHHBERT has the same model parameters as RoBERTa<sub>BASE</sub>, except that it used positional embeddings with a length of 185 to cover the maximum sequence length of 179 in VHHCorpus-2M.
|
9 |
+
Further details on VHHBERT are described in our paper "A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models.”
|
10 |
+
|
11 |
+
## Usage
|
12 |
+
|
13 |
+
The model and tokenizer can be loaded using the `transformers` library.
|
14 |
+
|
15 |
+
```python
|
16 |
+
from transformers import BertTokenizer, RobertaModel
|
17 |
+
tokenizer = BertTokenizer.from_pretrained("tsurubee/VHHBERT")
|
18 |
+
model = RobertaModel.from_pretrained("tsurubee/VHHBERT")
|
19 |
+
```
|
20 |
+
|
21 |
+
## Links
|
22 |
+
|
23 |
+
- Pre-training Corpus: https://huggingface.co/datasets/COGNANO/VHHCorpus-2M
|
24 |
+
- Code: https://github.com/cognano/AVIDa-SARS-CoV-2
|
25 |
+
- Paper: TBD
|
26 |
+
|
27 |
+
## Citation
|
28 |
+
|
29 |
+
If you use VHHBERT in your research, please cite the following paper.
|
30 |
+
|
31 |
+
```bibtex
|
32 |
+
TBD
|
33 |
+
```
|