AbLang_heavy / README.md
qilowoq's picture
Update README.md
fcb396a
|
raw
history blame
1.4 kB
---
license: bsd
tags:
- chemistry
- biology
- protein
- antibodies
- antibody
- heavy chain
- AbLang
- CDR
- OAS
---
# AbLang model for heavy chains
This is a huggingface version of AbLang: A language model for antibodies. It was introduced in
[this paper](https://doi.org/10.1101/2022.01.20.477061) and first released in
[this repository](https://github.com/oxpig/AbLang). This model is trained on uppercase amino acids: it only works with capital letter amino acids.
# Intended uses & limitations
The model could be used for protein feature extraction or to be fine-tuned on downstream tasks (TBA).
### How to use
Here is how to use this model to get the features of a given protein sequence in PyTorch:
```python
from transformers import BertModel, BertTokenizer
tokenizer = AutoTokenizer.from_pretrained('qilowoq/AbLang_heavy')
model = AutoModel.from_pretrained('qilowoq/AbLang_heavy', trust_remote_code=True)
sequence_Example = ' '.join("QIHLVQSGTEVKKPGSSVTVSCKAYGVNTFGLYAVNWVRQAPGQSLEYIGQIWRWKSSASHHFRGRVLISAVDLTGSSPPISSLEIKNLTSDDTAVYFCTTTSTYDKWSGLHHDGVMAFSSWGQGTLISVSAASTKGPSVFPLAPSSGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSTQTYICNVNHKPSNTKVDKKVEPK")
encoded_input = tokenizer(sequence_Example, return_tensors='pt')
model_output = model(encoded_input)
```
Sentence embeddings can be produced as follows:
```python
seq_embs = model_output.last_hidden_state[:, 0, :]
```