Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -9,10 +9,37 @@ tags:
 - heavy chain
 - AbLang
 - CDR
 ---
-Sentence embeddings can be produced as follows:
 This is a huggingface version of AbLang: A language model for antibodies. It was introduced in
 [this paper](https://doi.org/10.1101/2022.01.20.477061) and first released in
-[this repository](https://github.com/oxpig/AbLang). This model is trained on uppercase amino acids: it only works with capital letter amino acids.

 - heavy chain
 - AbLang
 - CDR
+- OAS
 ---
+# AbLang model for heavy chains
 This is a huggingface version of AbLang: A language model for antibodies. It was introduced in
 [this paper](https://doi.org/10.1101/2022.01.20.477061) and first released in
+[this repository](https://github.com/oxpig/AbLang). This model is trained on uppercase amino acids: it only works with capital letter amino acids.
+# Intended uses & limitations
+The model could be used for protein feature extraction or to be fine-tuned on downstream tasks (TBA).
+### How to use
+Here is how to use this model to get the features of a given protein sequence in PyTorch:
+```python
+from transformers import BertModel, BertTokenizer
+tokenizer = AutoTokenizer.from_pretrained('qilowoq/AbLang_heavy')
+model = AutoModel.from_pretrained('qilowoq/AbLang_heavy', trust_remote_code=True)
+sequence_Example = ' '.join("QIHLVQSGTEVKKPGSSVTVSCKAYGVNTFGLYAVNWVRQAPGQSLEYIGQIWRWKSSASHHFRGRVLISAVDLTGSSPPISSLEIKNLTSDDTAVYFCTTTSTYDKWSGLHHDGVMAFSSWGQGTLISVSAASTKGPSVFPLAPSSGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSTQTYICNVNHKPSNTKVDKKVEPK")
+encoded_input = tokenizer(sequence_Example, return_tensors='pt')
+model_output = model(encoded_input)
+```
+Sentence embeddings can be produced as follows:
+```python
+seq_embs = model_output.last_hidden_state[:, 0, :]
+```