AmelieSchreiber commited on
Commit
a93858d
1 Parent(s): b9ac0c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -1,3 +1,68 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - esm
8
+ - esm-2
9
+ - protein
10
+ - binding-site
11
+ - biology
12
  ---
13
+ # ESM-2 for RNA Binding Site Prediction
14
+
15
+ A small RNA binding site predictor trained on dataset "S1" from [Data of protein-RNA binding sites](https://www.sciencedirect.com/science/article/pii/S2352340916308022#s0035)
16
+ using [facebook/esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D).
17
+
18
+ It only has a validation loss of `0.12738210861297214`.
19
+
20
+ To use, try running:
21
+
22
+ ```python3
23
+ import torch
24
+ from transformers import AutoTokenizer, EsmForTokenClassification
25
+
26
+ # Define the class mapping
27
+ class_mapping = {
28
+ 0: 'Not Binding Site',
29
+ 1: 'Binding Site',
30
+ }
31
+
32
+ # Load the trained model and tokenizer
33
+ model = EsmForTokenClassification.from_pretrained("AmelieSchreiber/esm2_t12_35M_UR50D_rna_binding_site_predictor")
34
+ tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t12_35M_UR50D")
35
+
36
+ # Define the new sequences
37
+ new_sequences = [
38
+ 'VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTK',
39
+ 'SQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWF',
40
+ # ... add more sequences here ...
41
+ ]
42
+
43
+ # Iterate over the new sequences
44
+ for seq in new_sequences:
45
+ # Convert sequence to input IDs
46
+ inputs = tokenizer(seq, truncation=True, padding='max_length', max_length=1290, return_tensors="pt")["input_ids"]
47
+
48
+ # Apply the model to get the logits
49
+ with torch.no_grad():
50
+ outputs = model(inputs)
51
+
52
+ # Get the predictions by picking the label (class) with the highest logit
53
+ predictions = torch.argmax(outputs.logits, dim=-1)
54
+
55
+ # Convert the tensor to a list of integers
56
+ prediction_list = predictions.tolist()[0]
57
+
58
+ # Convert the predicted class indices to class names
59
+ predicted_labels = [class_mapping[pred] for pred in prediction_list]
60
+
61
+ # Create a list that matches each amino acid in the sequence to its predicted class label
62
+ residue_to_label = list(zip(list(seq), predicted_labels))
63
+
64
+ # Print out the list
65
+ for i, (residue, predicted_label) in enumerate(residue_to_label):
66
+ print(f"Position {i+1} - {residue}: {predicted_label}")
67
+ ```
68
+